Reputation: 119
I am trying to iterate through all records in my collection, so here is the code I attempted:
source_collection = Collection(source["collection"], using="source")
iterator = source_collection.query_iterator(
batch_size=10,
output_fields=["*"],
)
while True:
result = iterator.next()
<do something>
if not result:
iterator.close()
break
I then wanted to retrieve the total count of the records I have in my collection before starting to iterate, so that it can help me with tracking my progress. I believe I need to use MilvusClient to assist me like the following code:
source_client = MilvusClient(uri=target["endpoint"], token=target["token"])
response = source_client.query(
collection_name=source["collection"], output_fields=["count(*)"]
)
I am wondering if there is there a method where I can avoid having to connect twice, once with Connection and once with MilvusClient?
Upvotes: 1
Views: 429
Reputation: 961
Initially, establish a connection to Milvus, and after creating a Milvus collection, to retrieve the count of entities in a collection, use this line of code:
collection.num_entities
And for the count of entities in a partition, use this line of code:
collection.partition(partition_name).num_entities
Update 1: November 5, 2024
If someone wants to remove or delete data from the collection, the num_entities
does not provide the correct number of rows in the collection until a compaction on the data is performed. Does Milvus actually perform deduplication?
Upvotes: 0
Reputation: 54
I believe MilvusClient does not currently provide an iterator interface. However, to get the row count of a collection, you can use collection.query()
instead of MilvusClient.query()
results = collection.query(expr="", output_fields=["count(*)"])
print(results)
Upvotes: 1