Reputation: 131
I have a text file that contains data for the NER model, the data is in CoNLL format. The CoNLL format is a text file with one word per line with sentences separated by an empty line. The first word in a line should be the word and the last word should be the label.
Harry B-PER
Potter I-PER
was O
a O
student B-MISC
at B-PER
Hogwarts I-PER
Albus B-PER
Dumbledore I-PER
founded O
the O
Order B-ORG
of I-ORG
the I-ORG
Phoenix I-ORG
I want to split the file into three sets (train, valid, and test) with percentages(70:10:20) respectively. But I didn't find any helpful tutorials to show what libraries are used to split such kinds of files.
Any help would be appreciated.
Upvotes: 0
Views: 462