Reputation: 10109
I am using flair to train a custom NER model, but i want to also try out spacy, but my data is currently in this format
No O
1320160208478 B-NUM
P O
R O
Name O
Ryan B-PER
Dsouza B-PER
Any suggestions on how i could format this in the spacy NER format? Thanks in advance.
Upvotes: 0
Views: 612
Reputation: 11474
Spacy has built-in converters for some common formats, but this isn't quite one of them. I think the easiest one to convert to would be the CoNLL 2003 NER format, which would need two additional space-separated columns with placeholder values in between the words and tags so that the IOB tags are in the 4th column:
No _ _ O
1320160208478 _ _ B-NUM
P _ _ O
R _ _ O
Name _ _ O
Ryan _ _ B-PER
Dsouza _ _ B-PER
Put blank lines between sentences and if you have multiple documents in one file you can add this between documents to separate them.
-DOCSTART- -X- O O
Then you can use a built-in converter:
python -m spacy convert -c ner input.txt output_dir
(Also, are you sure two B-PER
tags in a row is correct for Ryan Dsouza
in your data?)
Upvotes: 1