Reputation: 19
Can someone please advise me upon the hardware requirements of using sentence-transformers/all-MiniLM-L6-v2 for a semantic similarity use-case. I had downloaded the model locally and am using it to generate embedding, and finally using util.pytorch_cos_sim to calculate similarity scores between 2 sentences. All was working good in my Mac Pro ( 2.4 GHz 8-Core Intel Core i9 processor and 32 GB memory); but after I moved the model to containers of 1 core CPU and 4 GB RAM (within my company provided network), the code is taking at least 15-20 times more time to generate the cosine similarity score.
Did someone face a similar situation? Kindly advise. Thank you in advance for the help!
N.B.: I am also sharing the sample code for reference.
from sentence_transformers import SentenceTransformer, util
sentences = ["What happens when my account is debited", "What is a debit"]
# Model Instantiation
sent_sim_model = SentenceTransformer('./all-MiniLM-L6-v2')
embedding_0= sent_sim_model.encode(sentences[0], convert_to_tensor=True)
embedding_1 = sent_sim_model.encode(sentences[1], convert_to_tensor=True)
# Calculate cosine sim score:
print(util.pytorch_cos_sim(embedding_0, embedding_1).tolist()[0][0])
I have been running the model successfully in my local system for quite sometime now (after storing it locally in the same directory as that of the code), but once I had moved the model and the above code to a docker container , the response time (which used to be between 2-3 secs in my local system) had gone up to more than 1 minute. Since each container I am using has got a configuration of 1 CPU core and 4 GB RAM, I would like to get inputs on the fact if this low hardware can be the issue for the above code .
Upvotes: 1
Views: 5142
Reputation: 46
I can't add a comment so giving a full reply
I built a tiny docker rest API with Flask and deployed it to https://fly.io/ with under 2gb
, and I get pretty good results
from flask import Flask, jsonify, request
from flask_cors import CORS
from sentence_transformers import SentenceTransformer, util
from dotenv import load_dotenv
from flask import request
app = Flask(__name__)
CORS(app)
load_dotenv()
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
@app.route('/get_embeddings', methods=['POST'])
def get_embeddings():
text = request.json.get('text')
embeddings = model.encode(text)
return jsonify(embeddings=embeddings.tolist())
@app.route('/get_score', methods=['GET'])
def get_score():
# {
# "text": [
# "What happens when my account is debited",
# "What is a debit"
# ]
# }
# sentences = request.json.get('text') # [str, str]
sentences = ["What happens when my account is debited", "What is a debit"]
embedding_0= model.encode(sentences[0], convert_to_tensor=True)
embedding_1 = model.encode(sentences[1], convert_to_tensor=True)
score = util.pytorch_cos_sim(embedding_0, embedding_1).tolist()[0][0]
return jsonify(score=score)
if __name__ == "__main__":
app.run(host="0.0.0.0", debug = False)
built with nixpack
nixpacks build ./ --name embedder
run locally
docker run -m 1gb --cpus 1 -p 5000:5000 embedder
flyctl scale memory 2048 -a embedder
deploy to Fly or Railway and test using postman, it takes a few seconds to show the results
Upvotes: 1