How to use BERT pretrain embeddings with my own new dataset?

My dataset and NLP task is very different from the large corpus what authors have pre-trained their model (https://github.com/google-research/bert#pre-training-with-bert), so I can't directly fine-tune. Is there any example code/GitHub that can help me to train BERT with my own data? I expect to get embeddings like glove.

Thank you very much!

Upvotes: 3

Answers (1)

Ashwin Geet D'Sa

Reputation: 7379

Yes, you can get BERT embeddings, like other word embeddings using extract_features.py script. You have the capability to select the number of layers from which you need the output. Usage is simple, you have to save one sentence per line in a text file and pass it as input. Output will be a JSONL file providing contextual embeddings per token.

The usage of script with documentation is provided at: https://github.com/google-research/bert#using-bert-to-extract-fixed-feature-vectors-like-elmo

Upvotes: 3

How to use BERT pretrain embeddings with my own new dataset?

Answers (1)

Related Questions