Reputation: 71
I am pretty new to Hugging-Face transformers. I am facing the following issue when I try to load xlm-roberta-base model from a given path:
>> tokenizer = AutoTokenizer.from_pretrained(model_path)
>> Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 182, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 309, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 458, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_roberta.py", line 98, in __init__
**kwargs,
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 133, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType
However, if I load it by its name, there is no problem:
>> tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
I would appreciate any help.
Upvotes: 7
Views: 12560
Reputation: 1
RobertaTokenizer.from_pretrained('Salesforce/codet5-small',use_fast=True) - works for me use_fast=True
Upvotes: 0
Reputation: 82
It seems you have missed the model.tokenizer file. Locate the directory of the model; if on hugginface switch to "Files and versions".
Check for some parameters while downloading the model and fix it, or manually download and place the file into the model directory to be used while loading the model.
Upvotes: 0
Reputation: 13
I encountered the same problem. To use models from the local machine.
os.environ['TRANSFORMERS_OFFLINE']='1'
This tells the library to use local files only. You can read more about it on Hugging Face Installation - Offline Mode
from transformers import RobertaTokenizer
tokenizer = RobertaTokenizer.from_pretrained('Model_Path')
The path should be the location path of the model folder from the current file directory. For example, if model files are in the models folder under the xlm-roberta-base folder path should be 'models/xlm-roberta-base/'
Upvotes: 0
Reputation: 1
I encountered the same error message, to fix it, you can add use_fast=True
in the arguments.
generator = AutoTokenizer.from_pretrained(generator_path, config=config.generator, use_fast=True)
Upvotes: 0
Reputation: 19365
I assume you have created that directory as described in the documentation with :
tokenizer.save_pretrained('YOURPATH')
There is currently an issue under investigation which only affects the AutoTokenizers but not the underlying tokenizers like (XLMRobertaTokenizer). For example the following should work:
from transformers import XLMRobertaTokenizer
tokenizer = XLMRobertaTokenizer.from_pretrained('YOURPATH')
To work with the AutoTokenizer you also need to save the config to load it offline:
from transformers import AutoTokenizer, AutoConfig
tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
config = AutoConfig.from_pretrained('xlm-roberta-base')
tokenizer.save_pretrained('YOURPATH')
config.save_pretrained('YOURPATH')
tokenizer = AutoTokenizer.from_pretrained('YOURPATH')
I recommend to either use a different path for the tokenizers and the model or to keep the config.json of your model because some modifications you apply to your model will be stored in the config.json which is created during model.save_pretrained()
and will be overwritten when you save the tokenizer as described above after your model (i.e. you won't be able to load your modified model with tokenizer config.json).
Upvotes: 3