Reputation: 11211
I am new to AI, and looking into langchain to communicate with my data which is already there in my mongoDb with openAI. For that trying to use Langchain
I searched a lot , but all the tutorials are first putting the PDF data in mongo , then after doing some working on the data, indexing, then start communicating,
for example: https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas
but i already have different collections in my mongodb and documents in them I directly want to ask questions with gpt, what would be the best way to acheive this, If there is a better way instead of langchain, will be happy to explore
Thanks
Upvotes: 1
Views: 1232
Reputation: 2621
There are different approaches to achieving your goal. Here's one recommended approach from MongoDB.
This article from MongoDB describes a high-level approach for storing vector embeddings with each document in the existing collection and creating a separate vector search index. When a document is created, the vector embeddings are created at the same time. The developer determines how the vector embeddings are generated (e.g. which field values are used to generate the embeddings).
Example document:
{
"_id": ObjectId("238478293"),
"title": "MongoDB TV",
"description": "All your MongoDB updates, news, videos, and podcast episodes, straight to you!",
"genre": ["Programming", "Database", "MongoDB"],
...
“vectorEmbeddings”: [ 0.25, 0.5, 0.75, 0.1, 0.1, 0.8, 0.2, 0.6, 0.6, 0.4, 0.9, 0.3, 0.2, 0.7, 0.5, 0.8, 0.1, 0.8, 0.2, 0.6 ],
...
"seasons": [
...
A separate vector search index is created with a vector
field type.
{
"fields":[
{
"type": "vector",
"path": "<field-to-index>",
"numDimensions": <number-of-dimensions>,
"similarity": "euclidean | cosine | dotProduct"
},
]
}
Extras: Indexing with vector search
Finally, the $vectorSearch
operator can be used to perform a vector search on the search index. The search results (the documents) can be parsed and added to the final prompt that's sent to the LLM. The parsing logic is determined by the developer. This approach can be achieved with or without LangChain.
Extras: Querying with vector search
MongoDBAtlasVectorSearch
LangChain MongoDBAtlasVectorSearch
can still be used, but the parameter values will be dependent on how the collection and search index were created.
from langchain_community.vectorstores import MongoDBAtlasVectorSearch
vector_search = MongoDBAtlasVectorSearch.from_connection_string(
connection_string="insert MongoDB connection string",
namespace="insert MongoDB namespace",
embedding=OpenAIEmbeddings(disallowed_special=()),
collection="insert existing MongoDB collection",
index_name="insert existing MongoDB vector search index",
text_key="insert text field name",
embedding_key="insert embeddings field name",
)
The source code for MongoDBAtlasVectorSearch
is fairly easy to navigate and should explain how each of the parameters must be set. Specifically, text_key
should be set to the field whose value is embedded and embedding_key
must be set to the embeddings field.
Upvotes: 1