Reputation: 191
all
I have a NER model that was fine-tuned based on BERT on sentences using the BIO annotation framework.
Here is an example of my train data.
"['The', 'raw', 'datafiles', 'used', 'in', 'this', 'study', 'were', 'obtained', 'from', 'the', 'EMBL', '-', 'EBI', 'ArrayExpress', '[', '70', ']', ',', 'or', 'NCBI', 'Gene', 'Expression', 'Omnibus', '(', 'GEO', ')', '[', '71', ']', 'websites', '.']",
"['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-Operation', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']"
My prediction results come in token level in BIO format. However, there are some issues with the output:
Means
tag inserted into the Data
tagthe (B-Data) mo (B-Means) fa (I-Means)2 (I-Means)model (I-Data)
I-
tag rather than B-
tagblinded scoring (I-Operation) of
I wonder why this is happening and how to resolve this. Thanks.
Upvotes: 0
Views: 50