Mai
Mai

Reputation: 131

What is the way used to split text file of CoNLL format into train, valid and test sets?

I have a text file that contains data for the NER model, the data is in CoNLL format. The CoNLL format is a text file with one word per line with sentences separated by an empty line. The first word in a line should be the word and the last word should be the label.

Harry B-PER
Potter I-PER
was O
a O
student B-MISC
at B-PER
Hogwarts I-PER

Albus B-PER
Dumbledore I-PER
founded O
the O
Order B-ORG
of I-ORG
the I-ORG
Phoenix I-ORG

I want to split the file into three sets (train, valid, and test) with percentages(70:10:20) respectively. But I didn't find any helpful tutorials to show what libraries are used to split such kinds of files.

Any help would be appreciated.

Upvotes: 0

Views: 462

Answers (0)

Related Questions