Reputation: 143
I am working on building a custom NER using spacy for recognizing new entities apart from spacy's NER. Now I have my training data to be tagged and added using spacy.Example. I am using the BILOU scheme. My doubt is that I have entities which have more than 3 words. For example:
Housing Development Finance Corporation reported heavy losses in the past quarter.
I want to tag Housing Development Finance Corporation as a single Entity using the BILOU scheme. Something like
'Housing' B-Entity
'Development' I-Entity
'Finance' I-Entity
'Corporation' L-Entity
Is this tagging correct?How will the model interpret the order within each entity?Any guidance would be much appreciated.
Upvotes: 2
Views: 468
Reputation: 365
The tagging you have is correct while all outside words which are not entities would be marked with O
.
The model will be depending on the same order within the entity to match it towards a previous entity of the same name, ex:
'Housing' B-Entity
'Development' I-Entity
'Finance' I-Entity
'Corporation' L-Entity
and
'Housing' B-Entity
'Finance' I-Entity
'Development' I-Entity
'Corporation' L-Entity
will not be linked as the same entity, although if you want this to be the case, you could look into a classification model to classify your foud entities towards your previously known entities and work from there.
Upvotes: 1