jacob mathew
jacob mathew

Reputation: 153

Annotating sentence with BILOU tags for spaCy

How should I annotate (define entities in) the following sentence with BILOU tags? Especially, how should I handle special characters/punctuation which are attached to words without any space since BILOU doens't provide the character position? e.g (Principal, (Co-investigator), Dr. etc.

Dr. med. XYZ DEF (Principal Investigator) XYZ ABC (Co-investigator), Dr. med. XYZ RST (Independent Rater)

Should I consider (Principal as single entity?

Upvotes: 2

Views: 1753

Answers (1)

polm23
polm23

Reputation: 15633

For BILOU tagging you need to have pre-tokenized text. Whether (Principal is one token or two depends on your tokenizer, but it would usually be split.

Here's an example of BILOU using spaCy with the default english model and some basic tags:

Dr.    O
med    O
.    O
XYZ    B-PERSON
DEF    L-PERSON
(    O
Principal    B-ROLE
Investigator    L-ROLE
)    O
XYZ    B-PERSON
ABC    L-PERSON
(    O
Co    B-ROLE
-    I-ROLE
investigator    L-ROLE
)    O
,    O
Dr.    O
med    O
.    O
XYZ    B-PERSON
RST    L-PERSON
(    O
Independent    B-ROLE
Rater    L-ROLE
)    O

If you're using spaCy, you can specify NER labels with character ranges for training data, which should help with variations in tokenizer input. For details see the training documentation.

Upvotes: 1

Related Questions