Ashitosh Bhosale
Ashitosh Bhosale

Reputation: 73

How to solve ValueError: [E177] Ill-formed IOB input detected: an?

I am trying to convert conll format data into spacy's json format to train a model.

I am using spacy's convert for the same. I have tried this command

      python -m spacy convert conll_dataset.tsv /Users/user/docs -t json -c ner

I am getting a value error.

     ValueError: [E177] Ill-formed IOB input detected: in

I deleted all occurring of 'in' in a dataset and tried again , then I got a same error with a slight change.

     ValueError: [E177] Ill-formed IOB input detected: an

Help me out wth this problem. my dataset looks like this

     Abhishek   Name
     Jha    Name
     Application    Designation
     Development    Designation
     Associate  Designation

I am using spacy 2.3.2

Upvotes: 2

Views: 807

Answers (1)

polm23
polm23

Reputation: 15593

IOB format means a tag is either blank, "O", or like "B-PERSON". This is the format used for IOB tags in CoNLLu files. Your tags "in" and "an" are not in that format so they aren't valid.

I am not sure what format your data is but it doesn't look like normal CoNLL data, expecially if it's actually prefaced with tabs and that's not an accident. You should be able to convert the second column to IOB tags, with a risk of merging adjacent entities, by prepending "I-" to each tag or something. Look at the example data to see what spaCy expects.

Upvotes: 1

Related Questions