Tructed BIO format in NER prediction results

Question

all

I have a NER model that was fine-tuned based on BERT on sentences using the BIO annotation framework.

Here is an example of my train data.

"['The', 'raw', 'datafiles', 'used', 'in', 'this', 'study', 'were', 'obtained', 'from', 'the', 'EMBL', '-', 'EBI', 'ArrayExpress', '[', '70', ']', ',', 'or', 'NCBI', 'Gene', 'Expression', 'Omnibus', '(', 'GEO', ')', '[', '71', ']', 'websites', '.']",

"['O', 'O', 'O', 'O',  'O', 'O', 'O', 'O', 'O', 'B-Operation', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']"

My prediction results come in token level in BIO format. However, there are some issues with the output:

a mixture of different BIO tags. below there is a Means tag inserted into the Data tag

the (B-Data) mo (B-Means) fa (I-Means)2 (I-Means)model (I-Data)

a prediction result starts with the I- tag rather than B- tag

blinded scoring (I-Operation) of

I wonder why this is happening and how to resolve this. Thanks.

Tructed BIO format in NER prediction results

Answers (0)

Related Questions