Reputation: 371
I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-mean-tokens. I have an application that will be deployed to a device that does not have internet access. Here, it's already been answered, how to save the model Download pre-trained BERT model locally. Yet I'm stuck at loading the saved model from the locally saved path.
When I try to save the model using the above-mentioned technique, these are the output files:
('/bert-base-nli-mean-tokens/tokenizer_config.json',
'/bert-base-nli-mean-tokens/special_tokens_map.json',
'/bert-base-nli-mean-tokens/vocab.txt',
'/bert-base-nli-mean-tokens/added_tokens.json')
When I try to load it in the memory, using
tokenizer = AutoTokenizer.from_pretrained(to_save_path)
I'm getting
Can't load config for '/bert-base-nli-mean-tokens'. Make sure that:
- '/bert-base-nli-mean-tokens' is a correct model identifier listed on 'https://huggingface.co/models'
- or '/bert-base-nli-mean-tokens' is the correct path to a directory containing a config.json
Upvotes: 22
Views: 48145
Reputation: 41
Download all the files from huggingface save them in a folder locally
then do the below:
from sentence_transformers import SentenceTransformer, models
model_path = r'path/to/folder/containing model files'
word_embedding_model = models.Transformer(model_path)
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings)
Upvotes: 2
Reputation: 501
You can download and load the model like this
from sentence_transformers import SentenceTransformer
modelPath = "local/path/to/model"
model = SentenceTransformer('bert-base-nli-stsb-mean-tokens')
model.save(modelPath)
model = SentenceTransformer(modelPath)
This worked for me.
You can check the SBERT model details for the SentenceTransformer class in the documentation.
Upvotes: 39
Reputation: 124
There are many ways to solve this issue:
from transformers import AutoTokenizer
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("<username>/<model-name>")
from transformers import AutoModelForQuestionAnswering,AutoTokenizer,pipeline
nlp_QA=pipeline('question-answering',model='./abhilash1910/distilbert-squadv1',tokenizer='./abhilash1910/distilbert-squadv1')
QA_inp={
'question': 'What is the fund price of Huggingface in NYSE?',
'context': 'Huggingface Co. has a total fund price of $19.6 million dollars'
}
result=nlp_QA(QA_inp)
result
There are also other ways to resolve this but these might help. Also this list of pretrained models might help.
Upvotes: 3