Reputation: 27
I have to upsert the records in the MongoDB. I used a simple logic but it didn't work. Kindly help me fix this.
from pymongo import MongoClient
import json
import sys
import os
client = MongoClient('localhost', 9000)
db1 = client['Com_Crawl']
collection1 = db1['All']
posts1 = collection1.posts
ll=[]
f=file(sys.argv[1],'r')
for i in f:
j=json.loads(i)
ll.append(j)
#print ll
print len(ll)
count = 0
for l in ll:
count = count+1
if count <= 10000:
print count,l
print posts1.update({'vtid':l},{'$set': {'processed': 0}},upsert = True,multi = True)
print "**** Success ***"
The file contains 10 million records. The above code has inserted a new column and updated its value to '0' for 10000 records. But how can the rest of records in a batch of 10000 per execution.
Upvotes: 0
Views: 698
Reputation: 5648
Mongodb has bulk update operations which will update the database in bulk. you can add any no of dict and can update in a single go but it internally updates 1000 by 1000 in batch refer this to get an idea about ordered and unordered bulk operation and refer this to get an idea about bulk update refer this to know how bulk operations work. So if you follow bulk update it wiil be
from pymongo import MongoClient
client = MongoClient('localhost', 9000)
db1 = client['Com_Crawl']
collection1 = db1['All']
posts1 = collection1.posts
bulk = collection1.posts.initialize_unordered_bulk_op()
ll=[]
f=file(sys.argv[1],'r')
for i in f:
j=json.loads(i)
ll.append(j)
#print ll
print len(ll)
count = 0
for index,l in enumerate(ll):
bulk.find({'vtid':l}).update({'$set': {'processed': 0}},upsert = True,multi = True)
if (index+1)%10000 == 0:
bulk.execute() #this updates the records and prints the status.
bulk = collection1.posts.initialize_unordered_bulk_op() #reinitialise for next set of operations.
bulk.execute() #this updates the remaining last records.
as pointed by Joe D you can also skip the records and update in bulk.
Upvotes: 0
Reputation: 388
You could do something like this instead.
for l in ll:
for post in posts1.find({}).skip(count*10000).limit(10000):
print post.update({'vtid':l},{'$set': {'processed': 0}},upsert = True,multi = True)
count += 1
print "**** Success ***"
skip()
does exactly what you'd think, it skips that many entries in the queryset, then limit()
limits that results to 10000. So essentially you're using count
to get the entries starting with 0, 10000, 20000, etc. and limit only grabs 10000 after that starting point.
Upvotes: 1