Reputation: 54258
I understand BART
and BERT
are two architectures available in HuggingFace Transformers. I wondered if I could convert BertTokenizer
to a BartTokenizer
.
The reason for converting is that a training script requires BartTokenizer
format, while the tokenizer I am using is a BertTokenizer
.
I saved my tokenizer and the model using save_pretrained
to a local folder. The files are as follows:
For BartTokenizer
, it looks for the following files:
How can I "convert" the BertTokenizer
to BartTokenizer
files? Thanks.
Using the following versions:
and on another machine (using the latest version available):
Upvotes: 1
Views: 56
Reputation: 581
I am not sure that there's a simple way for obtaining merges (merges.txt
) from your berttokenizer, but the easiest you can do - add tokens from your bertokenizer:
from transformers import BartTokenizerFast
bert_tokenizer # your tokenizer
bart_tokenizer = BartTokenizerFast.from_pretrained("facebook/bart-base")
bert_tokens = list(bert_tokenizer.vocab.keys())
bart_tokenizer.add_tokens(bert_tokens)
Upvotes: 0