Reputation: 942
Assume a MongoDB collection containing documents which must be updated with new fields or subobjects regularly; alternatively, if the document does not exist yet, the regular document update process shall insert a new document (a typical upsert).
What is the fastest way of achieving this? At the moment I have a three stage process which is very slow:
Stage 1: find the documents which must be updated based on a list containing their customIDs (there exists an index on the customID field).
db[myCollection].find({'customID': {'$in': myUpdateList}})
Stage 2: iterate over the documents in the cursor retrieved in Stage 1, enriching them with new fields and/or subobjects. Add the new documents which can not yet be updated since they are not yet in the database to the same document list.
Stage 3: upsert to MongoDB using an Unordered Bulk Operation.
bulk_mapping = db[myCollection].initialize_unordered_bulk_op()
for key, value in enrichedDocs.items():
bulk_mapping.find({'customID': key}).upsert().update({'$set': {'customID': key, 'enrichedBody': value['enrichedBody']}})
bulk_mapping.execute()
Upvotes: 1
Views: 296
Reputation: 9268
You dont need to first .find()
and then .update()
, you can directly do update
with upsert
option.
Try this :
bulk_mapping = db[myCollection].initialize_unordered_bulk_op()
for key, value in enrichedDocs.items():
bulk_mapping.update({
'customID': key
},{
'$set': {
'customID': key,
'enrichedBody': value['enrichedBody']
}
},upsert=True)
bulk_mapping.execute()
Update
you can use the below code with pymongo to achieve bulk update:
from pymongo import UpdateOne
bulk_operations=[]
for key, value in enrichedDocs.items():
bulk_operations.append(
UpdateOne({
'customID': key
},{
'$set': {
'customID': key,
'enrichedBody': value['enrichedBody']
}
},upsert=True)
)
db[myCollection].bulk_write(bulk_operations);
Upvotes: 2