neha tamore
neha tamore

Reputation: 371

Download pre-trained sentence-transformers model locally

I am using the SentenceTransformers library (here: https://pypi.org/project/sentence-transformers/#pretrained-models) for creating embeddings of sentences using the pre-trained model bert-base-nli-mean-tokens. I have an application that will be deployed to a device that does not have internet access. Here, it's already been answered, how to save the model Download pre-trained BERT model locally. Yet I'm stuck at loading the saved model from the locally saved path.

When I try to save the model using the above-mentioned technique, these are the output files:

('/bert-base-nli-mean-tokens/tokenizer_config.json',
 '/bert-base-nli-mean-tokens/special_tokens_map.json',
 '/bert-base-nli-mean-tokens/vocab.txt',
 '/bert-base-nli-mean-tokens/added_tokens.json')

When I try to load it in the memory, using

tokenizer = AutoTokenizer.from_pretrained(to_save_path)

I'm getting

Can't load config for '/bert-base-nli-mean-tokens'. Make sure that:

- '/bert-base-nli-mean-tokens' is a correct model identifier listed on 'https://huggingface.co/models'

- or '/bert-base-nli-mean-tokens' is the correct path to a directory containing a config.json 

Upvotes: 22

Views: 48145

Answers (3)

Kaustubh Ratna
Kaustubh Ratna

Reputation: 41

Download all the files from huggingface save them in a folder locally

then do the below:

from sentence_transformers import SentenceTransformer, models

Path to the locally saved model

model_path = r'path/to/folder/containing model files'

Load the transformer model and tokenizer manually

word_embedding_model = models.Transformer(model_path)

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())

Assemble the sentence transformer model

model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

You can check whether the model is correctly loaded or not by doing the below:

sentences = ["This is an example sentence", "Each sentence is converted"]

embeddings = model.encode(sentences)

print(embeddings)

Upvotes: 2

elotech
elotech

Reputation: 501

You can download and load the model like this

from sentence_transformers import SentenceTransformer
modelPath = "local/path/to/model"

model = SentenceTransformer('bert-base-nli-stsb-mean-tokens')
model.save(modelPath)
model = SentenceTransformer(modelPath)

This worked for me.

You can check the SBERT model details for the SentenceTransformer class in the documentation.

Upvotes: 39

Abhilash Majumder
Abhilash Majumder

Reputation: 124

There are many ways to solve this issue:

  • Assuming you have trained your BERT base model locally (colab/notebook), in order to use it with the Huggingface AutoClass, then the model (along with the tokenizers,vocab.txt,configs,special tokens and tf/pytorch weights) has to be uploaded to Huggingface. The steps to do this is mentioned here. Once it is uploaded, there will be a repository created with your username, and then the model can be accessed as follows:
from transformers import AutoTokenizer
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("<username>/<model-name>")
  • The second way is to use the trained model locally, and this can be done by using pipelines.The following is an example how to use this model trained(&saved) locally for your use-case (giving an example from my locally trained QA model):
from transformers import AutoModelForQuestionAnswering,AutoTokenizer,pipeline
nlp_QA=pipeline('question-answering',model='./abhilash1910/distilbert-squadv1',tokenizer='./abhilash1910/distilbert-squadv1')
QA_inp={
    'question': 'What is the fund price of Huggingface in NYSE?',
    'context': 'Huggingface Co. has a total fund price of $19.6 million dollars'
}
result=nlp_QA(QA_inp)
result

There are also other ways to resolve this but these might help. Also this list of pretrained models might help.

Upvotes: 3

Related Questions