Richard
Richard

Reputation: 21

Chromadb: Why do results of collection.query() and collection.get() differ?

I am using Chromadb Version 0.5.23

print(collection.query(...))

produces something like:

{'ids': [['id1', 'id2', 'id3']], 'embeddings': None, 'documents': None, 'uris': None, 'data': None, 'metadatas': None, 'distances': [[0.2003527583406446, 0.21832232106694371, 0.23420078419011314]], 'included': [<IncludeEnum.distances: 'distances'>]}

This is a dict with lists of lists.

print(collection.get(...))

produces something like:

{'ids': ['id1', 'id2', 'id3'], 'embeddings': None, 'documents': ['Text1', 'Text2', 'Text3'], 'uris': None, 'data': None, 'metadatas': None, 'included': [<IncludeEnum.documents: 'documents'>]}

A dict with lists.

Is there a special reason for this behavior, is it a bug, a feature?

I would expect that the results have the same format. More I do not see a reason for lists containing a single element only.

Upvotes: 0

Views: 79

Answers (2)

Richard
Richard

Reputation: 21

Looks like a typing error helped to find the answer myself!

collection.query(query_texts = ['first query', 'second query'])

allows to enter multiple querytexts, which lead to multiple results. Therefore the results contains

{'ids': [[results for first query], [results for second query] ...}

On the other hand

collection.get()

returns a single list of documents to return.

Upvotes: 0

vht981230
vht981230

Reputation: 4946

From the documentation, it looks like the difference in format between .query and .get are expected. In the section "Choosing Which Data is Returned", it mentions embeddings will be returned as a 2-d numpy array in .get and a python list of 2-d numpy arrays in .query.

Upvotes: 0

Related Questions