Reputation: 716
This is my first time working with conllu files. I'm not able to find any way to merge these files in the Conllu python library. Any leads would be helpful. Thanks.
Upvotes: 0
Views: 434
Reputation: 14126
Each time you call parse() you get a list of TokenLists back. Merging several files can therefore be done by merging those tokenlists.
Example:
from io import open
from conllu import parse_incr
files = ["file1.conllu", "file2.conllu", "file3.conllu"]
merged_tokenlists = []
for file in files:
data_file = open("file1.conllu", "r", encoding="utf-8")
for tokenlist in parse_incr(data_file):
merged_tokenlists.append(tokenlist)
Author of the conllu library here, happy to see people using it!
Upvotes: 1