Raptor
Raptor

Reputation: 54258

Is it possible to convert BertTokenizer to BartTokenizer?

I understand BART and BERT are two architectures available in HuggingFace Transformers. I wondered if I could convert BertTokenizer to a BartTokenizer.

The reason for converting is that a training script requires BartTokenizer format, while the tokenizer I am using is a BertTokenizer.

I saved my tokenizer and the model using save_pretrained to a local folder. The files are as follows:

For BartTokenizer, it looks for the following files:

How can I "convert" the BertTokenizer to BartTokenizer files? Thanks.


Using the following versions:

and on another machine (using the latest version available):

Upvotes: 1

Views: 56

Answers (1)

I am not sure that there's a simple way for obtaining merges (merges.txt) from your berttokenizer, but the easiest you can do - add tokens from your bertokenizer:

from transformers import BartTokenizerFast

bert_tokenizer # your tokenizer
bart_tokenizer = BartTokenizerFast.from_pretrained("facebook/bart-base")
bert_tokens = list(bert_tokenizer.vocab.keys())
bart_tokenizer.add_tokens(bert_tokens)

Upvotes: 0

Related Questions