hollow_coder
hollow_coder

Reputation: 51

Can I not add metadata to documents loaded using Chroma.from_documents()

I wanted to add additional metadata to the documents being embedded and loaded into Chroma.
I'm unable to find a way to add metadata to documents loaded using
Chroma.from_documents(documents, embeddings)
For example, imagine I have a text file having details of a particular disease, I wanted to add species as a metadata that is a list of all species it affects.

As a round-about way I loaded it in a chromadb collection by adding required metadata and persisted it

client = chromadb.PersistentClient(path="chromaDB")

collection = client.get_or_create_collection(name="test",
                                             embedding_function=openai_ef,
                                             metadata={"hnsw:space": "cosine"})
collection.add(
     documents=documents,
     ids=ids,
     metadatas=metadata
)

This was the result,

collection.get(include=['embeddings','metadatas'])

Output:

{'ids': ['id0',
'id1',
'embeddings': [[-0.014580891467630863,
0.0003901976451743394,
0.00793908629566431,
-0.027648288756608963,
-0.009689063765108585,
0.010222840122878551,
-0.00946609303355217,
-0.002771923551335931,
-0.04675614833831787,
-0.02056729979813099,
0.014364678412675858,
...
{'species': 'XYZ', 'source': 'Flu.txt'},
{'species': 'ABC', 'source': 'Common_cold.txt'}],
'documents': None,
'uris': None,
'data': None}

Now I tried loading it from the directory persisted in the disk using Chroma.from_documents()

db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)

But I don't see anything loaded. db.get() results in this,

db.get(include=['metadatas'])

Output:

{'ids': [],
'embeddings': None,
'metadatas': [],
'documents': None,
'uris': None,
'data': None}

Please help. Need to load metadata to the files being loaded.

Upvotes: 4

Views: 5329

Answers (4)

Try deleting the collection with client.delete_collection(collection_name) first, then run your code.

The add method somehow does not add metadata to documents that already exist in your collection. Tested using chromadb==0.5.5.

Upvotes: 1

Pieler
Pieler

Reputation: 21

I would recommend you add the metadata to the document itself you are trying to load. This makes it clear exactly what metadata you are trying to add to what piece of content.

documents = [Document(page_content="Some content", metadata={"language": "EN", "author": "Unknown"}),]
Chroma.from_documents(documents=documents)

Upvotes: 2

Stefan
Stefan

Reputation: 59

I had an older version of Chroma DB that resulted in None's for metadata.

I had to pip install the most recent version:

pip install -U chromadb

Upvotes: 0

hollow_coder
hollow_coder

Reputation: 51

Found the answer myself.

I haven't mentioned the collection name while loading.

Instead of doing this,

db = Chroma(persist_directory="chromaDB", embedding_function=embeddings)

Do this

db = Chroma(persist_directory="chromaDB", embedding_function=embeddings, collection_name = 'your_collection_name')

In my case, the collection name is 'test'.

Upvotes: 1

Related Questions