Reputation: 11
I want to use the Megatron framework for Chinese NLP pre-training tasks. Currently, I have Chinese corpus resources and a vocab.txt file. However, for most frameworks, it seems that vocab.json and merge.txt are needed. Can I generate the above two files from Chinese corpus resources? If so, how can I generate them? Sorry, I haven't found a particularly suitable tutorial on Google.
I have tried to search for relevant tutorials and answers through Google, but have not found suitable results. I am hoping to obtain a method for generating a vocab file and merge file.
Upvotes: 1
Views: 196