Reputation: 73
I am trying to convert conll format data into spacy's json format to train a model.
I am using spacy's convert for the same. I have tried this command
python -m spacy convert conll_dataset.tsv /Users/user/docs -t json -c ner
I am getting a value error.
ValueError: [E177] Ill-formed IOB input detected: in
I deleted all occurring of 'in' in a dataset and tried again , then I got a same error with a slight change.
ValueError: [E177] Ill-formed IOB input detected: an
Help me out wth this problem. my dataset looks like this
Abhishek Name
Jha Name
Application Designation
Development Designation
Associate Designation
I am using spacy 2.3.2
Upvotes: 2
Views: 807
Reputation: 15593
IOB format means a tag is either blank, "O", or like "B-PERSON". This is the format used for IOB tags in CoNLLu files. Your tags "in" and "an" are not in that format so they aren't valid.
I am not sure what format your data is but it doesn't look like normal CoNLL data, expecially if it's actually prefaced with tabs and that's not an accident. You should be able to convert the second column to IOB tags, with a risk of merging adjacent entities, by prepending "I-" to each tag or something. Look at the example data to see what spaCy expects.
Upvotes: 1