Keep alignments in Named Entity Recognition tasks after cleaning text

Question

I am working on a Named Entity Recognition (NER) task and the entities are annotated in BRAT format (.txt + .ann). I have implemented some regular expressions to clean the texts before using my model, but if I modify the text I have to align the entities' offsets of the annotations. This task is relatively straightforward and after this, I can use my NLP model to classify the different entity classes. However, once I get the classification of the model I need to re-align the recognized entities in the original text, i.e. change the offsets of the cleaned text to those I had before the use of regular expressions. Is there a way to keep track of the original offsets after cleaning texts?

Keep alignments in Named Entity Recognition tasks after cleaning text

Answers (0)

Related Questions