Trevor Miller
Trevor Miller

Reputation: 119

Iterating and Counting Records in Milvus Collection without Double Connection

I am trying to iterate through all records in my collection, so here is the code I attempted:

source_collection = Collection(source["collection"], using="source")

iterator = source_collection.query_iterator(
    batch_size=10,
    output_fields=["*"],
)

while True:
    result = iterator.next()
    <do something>
    if not result:
        iterator.close()
        break

I then wanted to retrieve the total count of the records I have in my collection before starting to iterate, so that it can help me with tracking my progress. I believe I need to use MilvusClient to assist me like the following code:

source_client = MilvusClient(uri=target["endpoint"], token=target["token"])
response = source_client.query(
    collection_name=source["collection"], output_fields=["count(*)"]
)

I am wondering if there is there a method where I can avoid having to connect twice, once with Connection and once with MilvusClient?

Upvotes: 1

Views: 429

Answers (2)

BarzanHayati
BarzanHayati

Reputation: 961

Initially, establish a connection to Milvus, and after creating a Milvus collection, to retrieve the count of entities in a collection, use this line of code:

collection.num_entities

And for the count of entities in a partition, use this line of code:

collection.partition(partition_name).num_entities

Update 1: November 5, 2024

If someone wants to remove or delete data from the collection, the num_entities does not provide the correct number of rows in the collection until a compaction on the data is performed. Does Milvus actually perform deduplication?

Upvotes: 0

rachel song
rachel song

Reputation: 54

I believe MilvusClient does not currently provide an iterator interface. However, to get the row count of a collection, you can use collection.query() instead of MilvusClient.query()

results = collection.query(expr="", output_fields=["count(*)"])
print(results)

Upvotes: 1

Related Questions