Reputation: 3005
From the langchain
documentation - Per-User Retrieval
When building a retrieval app, you often have to build it with multiple users in mind. This means that you may be storing data not just for one user, but for many different users, and they should not be able to see eachother’s data. This means that you need to be able to configure your retrieval chain to only retrieve certain information.
The documentation has an example implementation using PineconeVectorStore
. Does chromadb support multiple users? If yes, can anyone help with an example of how the per-user retrieval can be implemented using the open source ChromaDB
?
Upvotes: 1
Views: 462
Reputation: 11
As response to @chifu lin answer, I think you can't differentiate the owner per document in metadata, since there is caution about that mentioned in here.
Caution: Chroma makes a best-effort to automatically save data to disk, however multiple in-memory clients can stop each other’s work. As a best practice, only have one client per path running at any given time.
I think you can use different persist directory specify in persist_directory
parameter when initializing Chroma object, something like this:
username = 'Joe'
db = Chroma.from_documents(pages, embeddings, persist_directory=f"./chroma_db/{username}")
When you want to get the data for user Joe, you can load it from disk like this:
vectordb = Chroma(persist_directory=f"chroma_db/{username}", embedding_function=embeddings)
ADDITION
When using in Langchain as retriever, you can use it directly with as_retriever()
function. If you also want to filter the source documents, you can filter it in search_kwargs
parameter:
pdf_paths = ['1.pdf', '2.pdf']
search_kwargs = {
"k": 3,
'fetch_k': 10,
'filter':{'source': {'$in': pdf_paths}},}
vectordb.as_retriever(
search_type="mmr",
search_kwargs=search_kwargs)
Upvotes: 1
Reputation: 156
We can use filter let Chromadb support multiple users.
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
persist_directory = 'your_db'
embeddings = OpenAIEmbeddings()
vectordb = Chroma(embedding_function=embeddings,
persist_directory=persist_directory)
vectordb.add_texts(["i worked at kensho"], metadatas=[{"user": "harrison"}])
vectordb.add_texts(["i worked at facebook"], metadatas=[{"user": "ankush"}])
# This will only get documents for Ankush
vectordb.as_retriever(search_kwargs={'filter': {'user':'ankush'}}).get_relevant_documents(
"where did i work?"
)
[Document(page_content='i worked at facebook', metadata={'user': 'ankush'})]
Upvotes: 3