Reputation: 51
I wanted to add additional metadata to the documents being embedded and loaded into Chroma.
I'm unable to find a way to add metadata to documents loaded using
Chroma.from_documents(documents, embeddings)
For example, imagine I have a text file having details of a particular disease, I wanted to add species as a metadata that is a list of all species it affects.
As a round-about way I loaded it in a chromadb collection by adding required metadata and persisted it
client = chromadb.PersistentClient(path="chromaDB")
collection = client.get_or_create_collection(name="test",
embedding_function=openai_ef,
metadata={"hnsw:space": "cosine"})
collection.add(
documents=documents,
ids=ids,
metadatas=metadata
)
This was the result,
collection.get(include=['embeddings','metadatas'])
Output:
{'ids': ['id0',
'id1',
'embeddings': [[-0.014580891467630863,
0.0003901976451743394,
0.00793908629566431,
-0.027648288756608963,
-0.009689063765108585,
0.010222840122878551,
-0.00946609303355217,
-0.002771923551335931,
-0.04675614833831787,
-0.02056729979813099,
0.014364678412675858,
...
{'species': 'XYZ', 'source': 'Flu.txt'},
{'species': 'ABC', 'source': 'Common_cold.txt'}],
'documents': None,
'uris': None,
'data': None}
Now I tried loading it from the directory persisted in the disk using Chroma.from_documents()
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)
But I don't see anything loaded. db.get()
results in this,
db.get(include=['metadatas'])
Output:
{'ids': [],
'embeddings': None,
'metadatas': [],
'documents': None,
'uris': None,
'data': None}
Please help. Need to load metadata to the files being loaded.
Upvotes: 4
Views: 5329
Reputation: 723
Try deleting the collection with client.delete_collection(collection_name)
first, then run your code.
The add
method somehow does not add metadata to documents that already exist in your collection. Tested using chromadb==0.5.5
.
Upvotes: 1
Reputation: 21
I would recommend you add the metadata to the document itself you are trying to load. This makes it clear exactly what metadata you are trying to add to what piece of content.
documents = [Document(page_content="Some content", metadata={"language": "EN", "author": "Unknown"}),]
Chroma.from_documents(documents=documents)
Upvotes: 2
Reputation: 59
I had an older version of Chroma DB that resulted in None
's for metadata.
I had to pip install the most recent version:
pip install -U chromadb
Upvotes: 0
Reputation: 51
Found the answer myself.
I haven't mentioned the collection name while loading.
Instead of doing this,
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)
Do this
db = Chroma(persist_directory="chromaDB", embedding_function=embeddings, collection_name = 'your_collection_name')
In my case, the collection name is 'test'.
Upvotes: 1