nsb
nsb

Reputation: 19

Hardware requirements for using sentence-transformers/all-MiniLM-L6-v2

Can someone please advise me upon the hardware requirements of using sentence-transformers/all-MiniLM-L6-v2 for a semantic similarity use-case. I had downloaded the model locally and am using it to generate embedding, and finally using util.pytorch_cos_sim to calculate similarity scores between 2 sentences. All was working good in my Mac Pro ( 2.4 GHz 8-Core Intel Core i9 processor and 32 GB memory); but after I moved the model to containers of 1 core CPU and 4 GB RAM (within my company provided network), the code is taking at least 15-20 times more time to generate the cosine similarity score.

Did someone face a similar situation? Kindly advise. Thank you in advance for the help!

N.B.: I am also sharing the sample code for reference.

from sentence_transformers import SentenceTransformer, util
sentences = ["What happens when my account is debited", "What is a debit"]

# Model Instantiation
sent_sim_model = SentenceTransformer('./all-MiniLM-L6-v2')
embedding_0= sent_sim_model.encode(sentences[0], convert_to_tensor=True)
embedding_1 = sent_sim_model.encode(sentences[1], convert_to_tensor=True)

# Calculate cosine sim score:
print(util.pytorch_cos_sim(embedding_0, embedding_1).tolist()[0][0])

I have been running the model successfully in my local system for quite sometime now (after storing it locally in the same directory as that of the code), but once I had moved the model and the above code to a docker container , the response time (which used to be between 2-3 secs in my local system) had gone up to more than 1 minute. Since each container I am using has got a configuration of 1 CPU core and 4 GB RAM, I would like to get inputs on the fact if this low hardware can be the issue for the above code .

Upvotes: 1

Views: 5142

Answers (1)

Navicstein rotciv
Navicstein rotciv

Reputation: 46

I can't add a comment so giving a full reply

I built a tiny docker rest API with Flask and deployed it to https://fly.io/ with under 2gb, and I get pretty good results

from flask import Flask, jsonify, request
from flask_cors import CORS
from sentence_transformers import SentenceTransformer, util
from dotenv import load_dotenv
from flask import request

app = Flask(__name__)
CORS(app)
load_dotenv()

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

@app.route('/get_embeddings', methods=['POST'])
def get_embeddings():
    text = request.json.get('text')
    embeddings = model.encode(text)
    return jsonify(embeddings=embeddings.tolist())


@app.route('/get_score', methods=['GET'])
def get_score():
    #   {
    #     "text": [
    #         "What happens when my account is debited",
    #         "What is a debit"
    #     ]
    # }
    # sentences = request.json.get('text') # [str, str]
    sentences = ["What happens when my account is debited", "What is a debit"]
    embedding_0= model.encode(sentences[0], convert_to_tensor=True)
    embedding_1 = model.encode(sentences[1], convert_to_tensor=True)
    score = util.pytorch_cos_sim(embedding_0, embedding_1).tolist()[0][0]
    return jsonify(score=score)

if __name__ == "__main__":
  app.run(host="0.0.0.0", debug = False)

built with nixpack

nixpacks build ./ --name embedder

run locally

docker run -m 1gb --cpus 1 -p 5000:5000 embedder
flyctl scale memory 2048 -a embedder

deploy to Fly or Railway and test using postman, it takes a few seconds to show the results

Upvotes: 1

Related Questions