Reputation: 27
I'm trying to update all documents in a collection. (~20m documents). Essentially the date string is stored as a month/day/year, and I want to instead store a date object so that I can sort based on that later. I've tried doing:
for document in collection.find({}, no_cursor_timeout=True):
date = document['Date']
dateobj = datetime.strptime(date, "%m/%d/%Y")
doc = col.find_one_and_update(
{"_id": document["_id"]},
{"$set":
{"DateObj_Sort": dateobj}
}, upsert=False
)
count = count + 1
print("Count is ", count)
But the problem is, the cursor gives me no cursor id found at around 180k documents
pymongo.errors.CursorNotFound: cursor id 361490694651426388 not found, full error: {'operationTime': Timestamp(1606772819, 18), 'ok': 0.0, 'errmsg': 'cursor id 361490694651426388 not found', 'code': 43, 'codeName': 'CursorNotFound', '$clusterTime': {'clusterTime': Timestamp(1606772819, 18), 'signature': {'hash': b'\xf7\xca*^\xea\xa8\xc3\xda\xa1\xa6\xf6\xec\x0cAQT\xa34T\x88', 'keyId': 6867170402951495684}}}
How can I update all the documents in a collection when there are soo many?
Upvotes: 0
Views: 48
Reputation: 38992
Offload conversion to the Database server.
Afterall date conversion gives similar results as new Date
in Javascript.
> new Date('04/01/2020');
<- Wed Apr 01 2020 00:00:00 GMT+0200 (Mitteleuropäische Sommerzeit) {}
In an Aggregation pipeline, you can merge results to documents in existing collection like so:
date_conversion_stage = {
"$addFields": {
"DateObj_Sort": { "$toDate": "$Date" }
}
}
merge_stage = {
"$merge": {
"into": "<replace_with_collection_name>",
"whenMatched": "merge"
}
}
collection.aggregate([
date_conversion_stage,
merge_stage,
])
Upvotes: 1