demongolem
demongolem

Reputation: 9708

How do you order annotations by offset in brat?

When using the rapid annotator tool brat, it appears that the created annotations file will present the annotation in the order that the annotations were performed by the user. If you start at the beginning of a document and go the end performing annotation, then the annotations will naturally be in the correct offset order. However, if you need to go earlier in the document and add another annotation, the offset order of the annotations in the output .ann file will be out of order.

How then can you rearrange the .ann file such that the annotations are in offset order when you are done? Is there some option within brat that allows you to do this or is it something that one has to write their own script to perform?

Upvotes: 1

Views: 299

Answers (1)

demongolem
demongolem

Reputation: 9708

Hearing nothing, I did write a python script to accomplish what I had set out to do. First, I reorder all annotations by begin index. Secondly, I resequence the label numbers so that they are once again in ascending order.

import optparse, sys

splitchar1 = '\t'
splitchar2 = ' '

# for brat, overlapped is not permitted (or at least a warning is generated)
# we could use this simplification in sorting by simply sorting on begin.  it is
# probably a good idea anyway.
class AnnotationRecord:
    label = 'T0'
    type = ''
    begin = -1
    end = -1
    text = ''

    def __repr__(self):
        return self.label + splitchar1
             + self.type + splitchar2
             + str(self.begin) + splitchar2
             + str(self.end) + splitchar1 + self.text

def create_record(parts):
    record = AnnotationRecord()
    record.label = parts[0]
    middle_parts = parts[1].split(splitchar2)
    record.type = middle_parts[0]
    record.begin = middle_parts[1]
    record.end = middle_parts[2]
    record.text = parts[2]
    return record

def main(filename, out_filename):
    fo = open(filename, 'r')
    lines = fo.readlines()
    fo.close()

    annotation_records = []

    for line in lines:
        parts = line.split(splitchar1)
        annotation_records.append(create_record(parts))

    # sort based upon begin    
    sorted_records = sorted(annotation_records, key=lambda a: int(a.begin))

    # now relabel based upon the sorted order
    label_value = 1
    for sorted_record in sorted_records:
        sorted_record.label = 'T' + str(label_value)
        label_value += 1

    # now write the resulting file to disk
    fo = open(out_filename, 'w')
    for sorted_record in sorted_records:
        fo.write(sorted_record.__repr__())        
    fo.close()


#format of .ann file is T# Type Start End Text
#args are input file, output file
if __name__ == '__main__':
    parser = optparse.OptionParser(formatter=optparse.TitledHelpFormatter(), 
                                   usage=globals()['__doc__'],
                                   version='$Id$')
    parser.add_option ('-v', '--verbose', action='store_true',
                       default=False, help='verbose output')
    (options, args) = parser.parse_args()
    if len(args) < 2:
        parser.error ('missing argument')
    main(args[0], args[1])
    sys.exit(0)

Upvotes: 2

Related Questions