Reputation: 11
I am trying to write a function that moves documents from collection_one to collection_two. I have run into an odd error where the counts don't add up.
collection_one.count({}) return 3.3 mil records
After moving all the documents, collection_two.count({})
return 3.2 mil.
Each document in collection_one
contains a unique uuid run_id
. When I run the following commands, these are the outputs:
collection_one.count({'run_id': { $eq : 'uuid'}}), I get 3.2 mil
collection_one.count({'run_id': { $ne : 'uuid'}}), I get 0;
Basically, there are 0.1 mil missing records, which only show up in the empty count.
I have tried moving the documents in a couple of different ways via pymongo and using copyTo() in the shell.
for doc in source.find():
try:
target.insert(doc)
except:
print('Did not copy')
and a batch move function
for n in range(0, ceil_num_of_batches):
result = source.find(data_filter).limit(batch_size).skip(n*batch_size)
insert_queries = [InsertOne(doc) for doc in result]
try:
target.bulk_write(insert_queries)
except BulkWriteError as bwe:
logger.error(bwe.details)
Both of these produce the same error. copyTo() however copies all 3.3 mil but has been deprecated.
Collection_two has a unique index but collection_one doesn't.
Upvotes: 1
Views: 957
Reputation: 1348
You need to use countDocumentsocuments to get an accurate document count. Count uses internal metadata and may not always give an accurate result.
Upvotes: 1