Reputation: 31
My dataset and NLP task is very different from the large corpus what authors have pre-trained their model (https://github.com/google-research/bert#pre-training-with-bert), so I can't directly fine-tune. Is there any example code/GitHub that can help me to train BERT with my own data? I expect to get embeddings like glove.
Thank you very much!
Upvotes: 3
Views: 3639
Reputation: 7379
Yes, you can get BERT embeddings, like other word embeddings using extract_features.py
script. You have the capability to select the number of layers from which you need the output. Usage is simple, you have to save one sentence per line in a text file and pass it as input. Output will be a JSONL file providing contextual embeddings per token.
The usage of script with documentation is provided at: https://github.com/google-research/bert#using-bert-to-extract-fixed-feature-vectors-like-elmo
Upvotes: 3