Ryan
Ryan

Reputation: 10109

Conversion of custom data to spacy ner format

I am using flair to train a custom NER model, but i want to also try out spacy, but my data is currently in this format

No O
1320160208478 B-NUM
P O
R O
Name O
Ryan B-PER
Dsouza B-PER

Any suggestions on how i could format this in the spacy NER format? Thanks in advance.

Upvotes: 0

Views: 612

Answers (1)

aab
aab

Reputation: 11474

Spacy has built-in converters for some common formats, but this isn't quite one of them. I think the easiest one to convert to would be the CoNLL 2003 NER format, which would need two additional space-separated columns with placeholder values in between the words and tags so that the IOB tags are in the 4th column:

No _ _ O
1320160208478 _ _ B-NUM
P _ _ O
R _ _ O
Name _ _ O
Ryan _ _ B-PER
Dsouza _ _ B-PER

Put blank lines between sentences and if you have multiple documents in one file you can add this between documents to separate them.


-DOCSTART- -X- O O

Then you can use a built-in converter:

python -m spacy convert -c ner input.txt output_dir

(Also, are you sure two B-PER tags in a row is correct for Ryan Dsouza in your data?)

Upvotes: 1

Related Questions