FewKey
FewKey

Reputation: 191

Tructed BIO format in NER prediction results

all

I have a NER model that was fine-tuned based on BERT on sentences using the BIO annotation framework.

Here is an example of my train data.

"['The', 'raw', 'datafiles', 'used', 'in', 'this', 'study', 'were', 'obtained', 'from', 'the', 'EMBL', '-', 'EBI', 'ArrayExpress', '[', '70', ']', ',', 'or', 'NCBI', 'Gene', 'Expression', 'Omnibus', '(', 'GEO', ')', '[', '71', ']', 'websites', '.']",

"['O', 'O', 'O', 'O',  'O', 'O', 'O', 'O', 'O', 'B-Operation', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']"

My prediction results come in token level in BIO format. However, there are some issues with the output:

  1. a mixture of different BIO tags. below there is a Means tag inserted into the Data tag
the (B-Data) mo (B-Means) fa (I-Means)2 (I-Means)model (I-Data)
  1. a prediction result starts with the I- tag rather than B- tag
blinded scoring (I-Operation) of

I wonder why this is happening and how to resolve this. Thanks.

Upvotes: 0

Views: 50

Answers (0)

Related Questions