Since the beginning of error analysis as an applied linguistics discipline a variety of error taxonomies have been proposed. Broadly speaking, they are based either on linguistic categories or on surface-structure descriptions (Dobric and Sigott 2014; Sigott, Cesnik and Dobric 2016; Pibal and Sigott, forthcoming). One of the problems inherent in these taxonomies is the high degree of subjectivity and the resulting lack of annotator agreement. The Scope – Substance error taxonomy takes an alternative approach to error description by using the concepts of scope and substance to describe errors. In principle, this distinction was already suggested by Lennon (1991), who used ‘extent’ to refer to scope and ‘domain’ to refer to substance. Scope refers to the linguistic and extralinguistic context that needs to be taken into account for an error to be noticed whereas substance designates the size of the linguistic structure that needs to be changed in order for the error to disappear. Levels of scope and substance are described in terms of word, phrase, clause, sentence and text (Quirk et al. 1985). This enables errors to be described in terms of fourteen combinations of scope and substance. We contend that this approach to error description should leave less room for inividual interpretation because the model of grammatical analysis that it is based on is made explicit
In order to investigate annotator agreement reached on the basis of the taxonomy, a pilot study with trained student annotators was conducted. Annotator agreement was expressed in terms of Error Location Density Indices developed for this purpose. The pilot study has shown that while the approach has potential, annotator agreement was still low. This was attributed to a lack of detail in the instructions provided for annotators. Consequently, the guidelines for the application of the taxonomy were refined. They now contain instructions for setting up an authoritative reconstruction of the learner text by applying the principle of minimal correction, instructions for dealing with unitary constituency, nested error, multi-level error and multiple error. The refined guidelines also contain instructions for dealing with missing constituents and superfluous constituents. Moreover, guidelines for dealing with punctuation errors have been formulated.
Currently, a second pilot study of the taxonomy including the refined guidelines is in progress. The same learner texts will be used. In addition to a student annotator group a second annotator group will be recruited from university language teaching staff in order to investigate the effect of differences in language proficiency. The results will be available later in the year and will be reported in the presentation.
Keywords: error annotation, learner corpora, inter-annotator agreement, error location density index, second language writing