Reputation: 365
I just wonder if the tokenizer is somehow affected or changed if fine tune a BERT model and save it. Do I need to save the tokenizer locally too to reload it when using the saved BERT model later?
I just do:
bert_model.save_pretrained('./Fine_tune_BERT/')
then later
bert_model = TFBertModel.from_pretrained('./Fine_tune_BERT/')
But do i need to saver the tokenizer too? Or could I just use it in the normal way like:
tokenizer = BertTokenizer.from_pretrained('bert-base-cased')
Upvotes: 4
Views: 4708
Reputation: 7369
In your case, the tokenizer need not be saved as it you have not changed the tokenizer or added new tokens. Huggingface tokenizer provides an option of adding new tokens or redefining the special tokens such as [MASK]
, [CLS]
, etc. If you do such modifications, then you may have to save the tokenizer to reuse it later.
Upvotes: 8
Reputation: 11220
The tokenizer cannot be affected by finetuning. The tokenizer converts the tokensto vocabulary indices which need to remain the same during the training otherwise, it would not be possible to train the static embedding at the beginning of the BERT computation.
Upvotes: 5