Reputation: 181
I am trying to bulk insert documents in MongoDB using python library pymongo
.
import pymongo
def tryManyInsert():
p = {'x' : 1, 'y' : True, 'z': None}
mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
mongoColl.insert_many([p for i in range(10)])
tryManyInsert()
But I keep failing due to BulkWriteError
.
Traceback (most recent call last):
File "/prog_path/testMongoCon.py", line 9, in <module>
tryManyInsert();
File "/prog_path/testMongoCon.py", line 7, in tryManyInsert
mongoColl.insert_many([p for i in range(10)])
File "/myenv_path/lib/python3.6/site-packages/pymongo/collection.py", line 724, in insert_many
blk.execute(self.write_concern.document)
File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 493, in execute
return self.execute_command(sock_info, generator, write_concern)
File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 331, in execute_command
raise BulkWriteError(full_result)
pymongo.errors.BulkWriteError: batch op errors occurred
I am trying to insert only 10 docs sequentially without _id
so conditions in this answer / discussion doesn't apply here. Similar question has no answer.
I have tried pymongo 3.4
and pymongo 3.5.1
, both give the same error. I am on python3.6
, mongodb 3.2.10
.
What am I doing wrong here?
Upvotes: 0
Views: 496
Reputation: 151190
Python is still referring to p
as being the same thing for each array member. You want a copy()
of p
for each array member:
import pymongo
from copy import copy
def tryManyInsert():
p = {'x' : 1, 'y' : True, 'z': None}
mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
mongoColl.insert_many([copy(p) for i in range(10)])
tryManyInsert()
Or even simply:
mongoColl.insert_many([{ 'x': 1, 'y': True, 'z': None } for i in range(10)])
Unless you do that the _id
only gets assigned once and you are simply repeating "the same document" with the same _id
in the argument to insert_many()
. Hence the error for a duplicate key.
As a quick demonstration:
from bson import ObjectId
p = { 'a': 1 }
def addId(obj):
obj['_id'] = ObjectId()
return obj
docs = map(addId,[p for i in range(2)])
print docs
Gives you:
[
{'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')},
{'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')}
]
Or more succinctly:
p = { 'a': 1 }
def addKey(x):
x[0]['b'] = x[1]
return x[0]
docs = map(addKey,[[p,i] for i,p in enumerate([p for i in range(3)])])
print docs
Gives:
[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}, {'a': 1, 'b': 2}]
Which clearly demonstrates the index value passed overwriting the same value which was passed in.
But using copy()
to take a copy of the value:
from bson import ObjectId
p = { 'a': 1 }
def addId(obj):
obj['_id'] = ObjectId()
return obj
docs = map(addId,[copy(p) for i in range(2)])
print docs
Gives you:
[
{'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc00')},
{'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc01')}
]
Or our base demonstration:
p = { 'a': 1 }
def addKey(x):
x[0]['b'] = x[1]
return x[0]
docs = map(addKey,[[p,i] for i,p in enumerate([copy(p) for i in range(3)])])
print docs
Returns:
[{'a': 1, 'b': 0}, {'a': 1, 'b': 1}, {'a': 1, 'b': 2}]
So this is basically how python works. If you don't actually deliberately assign to a new value, then all you are doing is returning the same referenced value and simply updating each referenced value in the loop, rather than producing a "new one".
Upvotes: 1