emifenac
emifenac

Reputation: 27

Updating all documents in a collection (million+) with Date object

I'm trying to update all documents in a collection. (~20m documents). Essentially the date string is stored as a month/day/year, and I want to instead store a date object so that I can sort based on that later. I've tried doing:

for document in collection.find({}, no_cursor_timeout=True):
date = document['Date']
dateobj = datetime.strptime(date, "%m/%d/%Y")
 doc = col.find_one_and_update(
     {"_id": document["_id"]},
     {"$set":
      {"DateObj_Sort": dateobj}
      }, upsert=False
 )
count = count + 1
print("Count is ", count)

But the problem is, the cursor gives me no cursor id found at around 180k documents

pymongo.errors.CursorNotFound: cursor id 361490694651426388 not found, full error: {'operationTime': Timestamp(1606772819, 18), 'ok': 0.0, 'errmsg': 'cursor id 361490694651426388 not found', 'code': 43, 'codeName': 'CursorNotFound', '$clusterTime': {'clusterTime': Timestamp(1606772819, 18), 'signature': {'hash': b'\xf7\xca*^\xea\xa8\xc3\xda\xa1\xa6\xf6\xec\x0cAQT\xa34T\x88', 'keyId': 6867170402951495684}}}

How can I update all the documents in a collection when there are soo many?

Upvotes: 0

Views: 48

Answers (1)

Oluwafemi Sule
Oluwafemi Sule

Reputation: 38992

Offload conversion to the Database server. Afterall date conversion gives similar results as new Date in Javascript.

> new Date('04/01/2020');

<- Wed Apr 01 2020 00:00:00 GMT+0200 (Mitteleuropäische Sommerzeit) {}

In an Aggregation pipeline, you can merge results to documents in existing collection like so:

date_conversion_stage = {
    "$addFields": {
        "DateObj_Sort": { "$toDate": "$Date" }
    }
}

merge_stage = {
    "$merge": {
        "into": "<replace_with_collection_name>",
        "whenMatched": "merge"
    }
}

collection.aggregate([
    date_conversion_stage,
    merge_stage,
])

Upvotes: 1

Related Questions