junaidp
junaidp

Reputation: 11211

communicate with my existing data in mongodb with Langchain

I am new to AI, and looking into langchain to communicate with my data which is already there in my mongoDb with openAI. For that trying to use Langchain

I searched a lot , but all the tutorials are first putting the PDF data in mongo , then after doing some working on the data, indexing, then start communicating,
for example: https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas

but i already have different collections in my mongodb and documents in them I directly want to ask questions with gpt, what would be the best way to acheive this, If there is a better way instead of langchain, will be happy to explore

Thanks

Upvotes: 1

Views: 1232

Answers (1)

Andrew Nguonly
Andrew Nguonly

Reputation: 2621

There are different approaches to achieving your goal. Here's one recommended approach from MongoDB.

Store Vector Embeddings with Document

This article from MongoDB describes a high-level approach for storing vector embeddings with each document in the existing collection and creating a separate vector search index. When a document is created, the vector embeddings are created at the same time. The developer determines how the vector embeddings are generated (e.g. which field values are used to generate the embeddings).

Example document:

{
   "_id": ObjectId("238478293"),
   "title": "MongoDB TV",
   "description": "All your MongoDB updates, news, videos, and podcast episodes, straight to you!",
   "genre": ["Programming", "Database", "MongoDB"],
   ...
   “vectorEmbeddings”: [ 0.25, 0.5, 0.75, 0.1, 0.1, 0.8, 0.2, 0.6, 0.6, 0.4, 0.9, 0.3, 0.2, 0.7, 0.5, 0.8, 0.1, 0.8, 0.2, 0.6 ],
   ...
   "seasons": [
   ...

A separate vector search index is created with a vector field type.

{
  "fields":[
    {
      "type": "vector",
      "path": "<field-to-index>",
      "numDimensions": <number-of-dimensions>,
      "similarity": "euclidean | cosine | dotProduct"
    },
  ]
}

Extras: Indexing with vector search

Finally, the $vectorSearch operator can be used to perform a vector search on the search index. The search results (the documents) can be parsed and added to the final prompt that's sent to the LLM. The parsing logic is determined by the developer. This approach can be achieved with or without LangChain.

Extras: Querying with vector search

LangChain MongoDBAtlasVectorSearch

LangChain MongoDBAtlasVectorSearch can still be used, but the parameter values will be dependent on how the collection and search index were created.

from langchain_community.vectorstores import MongoDBAtlasVectorSearch


vector_search = MongoDBAtlasVectorSearch.from_connection_string(
   connection_string="insert MongoDB connection string",
   namespace="insert MongoDB namespace",
   embedding=OpenAIEmbeddings(disallowed_special=()),
   collection="insert existing MongoDB collection",
   index_name="insert existing MongoDB vector search index",
   text_key="insert text field name",
   embedding_key="insert embeddings field name",
)

The source code for MongoDBAtlasVectorSearch is fairly easy to navigate and should explain how each of the parameters must be set. Specifically, text_key should be set to the field whose value is embedded and embedding_key must be set to the embeddings field.

Upvotes: 1

Related Questions