Zhang_kg
Zhang_kg

Reputation: 11

How to generate a merge file and a vocab file in NLP field

I want to use the Megatron framework for Chinese NLP pre-training tasks. Currently, I have Chinese corpus resources and a vocab.txt file. However, for most frameworks, it seems that vocab.json and merge.txt are needed. Can I generate the above two files from Chinese corpus resources? If so, how can I generate them? Sorry, I haven't found a particularly suitable tutorial on Google.

I have tried to search for relevant tutorials and answers through Google, but have not found suitable results. I am hoping to obtain a method for generating a vocab file and merge file.

Upvotes: 1

Views: 196

Answers (0)

Related Questions