How to improve openAI Semantic search speed

Question

I have a database(currently a json file) of keywords and their embedding data that i created with openAI's embedding. What i am trying to do is a similarity search with the input keyword. So In my current flow when i enter a keyword firstly it is embedded and its compared with the embeddings of data from my database. What i have observed is when i add ten-thousands of data the similarity finding process takes too much time. What is a good solution i can implement to get faster results even when there is millions of embedding data in my database.

I have seen about implementing faiss by meta. But not sure if that is the proper solution Below is the similarity function i currently use

def cosine_similarity(a, b):
            return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
similarities = [cosine_similarity(embedding, embedding_query) for embedding in embeddings]

Currently embedding data is read from a json file like below

        with open("data_a_b_j.json", "r") as f:
            embedding_data = json.load(f)

        # Extract embeddings and corresponding descriptions from the loaded data
        embeddings = [item["embedding"] for item in embedding_data]

How to improve openAI Semantic search speed

Answers (0)

Related Questions