Download pre-trained sentence-transformers model locally

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-mean-tokens. I have an application that will be deployed to a device that does not have internet access. Here, it's already been answered, how to save the model Download pre-trained BERT model locally. Yet I'm stuck at loading the saved model from the locally saved path.

When I try to save the model using the above-mentioned technique, these are the output files:

('/bert-base-nli-mean-tokens/tokenizer_config.json',
 '/bert-base-nli-mean-tokens/special_tokens_map.json',
 '/bert-base-nli-mean-tokens/vocab.txt',
 '/bert-base-nli-mean-tokens/added_tokens.json')

When I try to load it in the memory, using

tokenizer = AutoTokenizer.from_pretrained(to_save_path)

I'm getting

Can't load config for '/bert-base-nli-mean-tokens'. Make sure that:

- '/bert-base-nli-mean-tokens' is a correct model identifier listed on 'https://huggingface.co/models'

- or '/bert-base-nli-mean-tokens' is the correct path to a directory containing a config.json

Upvotes: 22

Answers (3)

Kaustubh Ratna

Reputation: 41

Download all the files from huggingface save them in a folder locally

then do the below:

from sentence_transformers import SentenceTransformer, models

Path to the locally saved model

model_path = r'path/to/folder/containing model files'

Load the transformer model and tokenizer manually

word_embedding_model = models.Transformer(model_path)

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())

Assemble the sentence transformer model

model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

You can check whether the model is correctly loaded or not by doing the below:

sentences = ["This is an example sentence", "Each sentence is converted"]

embeddings = model.encode(sentences)

print(embeddings)

Upvotes: 2

elotech

Reputation: 501

You can download and load the model like this

from sentence_transformers import SentenceTransformer
modelPath = "local/path/to/model"

model = SentenceTransformer('bert-base-nli-stsb-mean-tokens')
model.save(modelPath)
model = SentenceTransformer(modelPath)

This worked for me.

You can check the SBERT model details for the SentenceTransformer class in the documentation.

Upvotes: 39

Abhilash Majumder

Reputation: 124

There are many ways to solve this issue:

Assuming you have trained your BERT base model locally (colab/notebook), in order to use it with the Huggingface AutoClass, then the model (along with the tokenizers,vocab.txt,configs,special tokens and tf/pytorch weights) has to be uploaded to Huggingface. The steps to do this is mentioned here. Once it is uploaded, there will be a repository created with your username, and then the model can be accessed as follows:

from transformers import AutoTokenizer
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("<username>/<model-name>")

The second way is to use the trained model locally, and this can be done by using pipelines.The following is an example how to use this model trained(&saved) locally for your use-case (giving an example from my locally trained QA model):

from transformers import AutoModelForQuestionAnswering,AutoTokenizer,pipeline
nlp_QA=pipeline('question-answering',model='./abhilash1910/distilbert-squadv1',tokenizer='./abhilash1910/distilbert-squadv1')
QA_inp={
    'question': 'What is the fund price of Huggingface in NYSE?',
    'context': 'Huggingface Co. has a total fund price of $19.6 million dollars'
}
result=nlp_QA(QA_inp)
result

The third way is to directly use Sentence Transformers from the Huggingface models repo.

There are also other ways to resolve this but these might help. Also this list of pretrained models might help.

Upvotes: 3