ctiid
ctiid

Reputation: 365

huggingface - save fine tuned model locally - and tokenizer too?

I just wonder if the tokenizer is somehow affected or changed if fine tune a BERT model and save it. Do I need to save the tokenizer locally too to reload it when using the saved BERT model later?

I just do:

bert_model.save_pretrained('./Fine_tune_BERT/')

then later

bert_model = TFBertModel.from_pretrained('./Fine_tune_BERT/')

But do i need to saver the tokenizer too? Or could I just use it in the normal way like:

tokenizer = BertTokenizer.from_pretrained('bert-base-cased')

Upvotes: 4

Views: 4708

Answers (2)

Ashwin Geet D'Sa
Ashwin Geet D'Sa

Reputation: 7369

In your case, the tokenizer need not be saved as it you have not changed the tokenizer or added new tokens. Huggingface tokenizer provides an option of adding new tokens or redefining the special tokens such as [MASK], [CLS], etc. If you do such modifications, then you may have to save the tokenizer to reuse it later.

Upvotes: 8

Jindřich
Jindřich

Reputation: 11220

The tokenizer cannot be affected by finetuning. The tokenizer converts the tokensto vocabulary indices which need to remain the same during the training otherwise, it would not be possible to train the static embedding at the beginning of the BERT computation.

Upvotes: 5

Related Questions