Reputation: 153
How should I annotate (define entities in) the following sentence with BILOU tags?
Especially, how should I handle special characters/punctuation which are attached to words without any space since BILOU doens't provide the character position? e.g (Principal
, (Co-investigator)
, Dr.
etc.
Dr. med. XYZ DEF (Principal Investigator) XYZ ABC (Co-investigator), Dr. med. XYZ RST (Independent Rater)
Should I consider (Principal
as single entity?
Upvotes: 2
Views: 1753
Reputation: 15633
For BILOU tagging you need to have pre-tokenized text. Whether (Principal
is one token or two depends on your tokenizer, but it would usually be split.
Here's an example of BILOU using spaCy with the default english model and some basic tags:
Dr. O
med O
. O
XYZ B-PERSON
DEF L-PERSON
( O
Principal B-ROLE
Investigator L-ROLE
) O
XYZ B-PERSON
ABC L-PERSON
( O
Co B-ROLE
- I-ROLE
investigator L-ROLE
) O
, O
Dr. O
med O
. O
XYZ B-PERSON
RST L-PERSON
( O
Independent B-ROLE
Rater L-ROLE
) O
If you're using spaCy, you can specify NER labels with character ranges for training data, which should help with variations in tokenizer input. For details see the training documentation.
Upvotes: 1