parthasarathy
parthasarathy

Reputation: 181

Bulk insert mapped array using pymongo fails due to BulkWriteError

I am trying to bulk insert documents in MongoDB using python library pymongo.

import pymongo
def tryManyInsert():
    p = {'x' : 1, 'y' : True, 'z': None}
    mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
    mongoColl.insert_many([p for i in range(10)])
tryManyInsert()

But I keep failing due to BulkWriteError.

Traceback (most recent call last):
    File "/prog_path/testMongoCon.py", line 9, in <module>
    tryManyInsert();
    File "/prog_path/testMongoCon.py", line 7, in tryManyInsert
mongoColl.insert_many([p for i in range(10)])
    File "/myenv_path/lib/python3.6/site-packages/pymongo/collection.py", line 724, in insert_many
blk.execute(self.write_concern.document)
    File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 493, in execute
return self.execute_command(sock_info, generator, write_concern)
    File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 331, in execute_command
raise BulkWriteError(full_result)
    pymongo.errors.BulkWriteError: batch op errors occurred

I am trying to insert only 10 docs sequentially without _id so conditions in this answer / discussion doesn't apply here. Similar question has no answer.

I have tried pymongo 3.4 and pymongo 3.5.1, both give the same error. I am on python3.6, mongodb 3.2.10. What am I doing wrong here?

Upvotes: 0

Views: 496

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151190

Python is still referring to p as being the same thing for each array member. You want a copy() of p for each array member:

import pymongo
from copy import copy
def tryManyInsert():
    p = {'x' : 1, 'y' : True, 'z': None}
    mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
    mongoColl.insert_many([copy(p) for i in range(10)])
tryManyInsert()

Or even simply:

    mongoColl.insert_many([{ 'x': 1, 'y': True, 'z': None } for i in range(10)])

Unless you do that the _id only gets assigned once and you are simply repeating "the same document" with the same _id in the argument to insert_many(). Hence the error for a duplicate key.

As a quick demonstration:

from bson import ObjectId

p = { 'a': 1 }

def addId(obj):
  obj['_id'] = ObjectId()
  return obj

docs = map(addId,[p for i in range(2)])
print docs

Gives you:

[
  {'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')}, 
  {'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')}
]

Or more succinctly:

p = { 'a': 1 }

def addKey(x):
  x[0]['b'] = x[1]
  return x[0]

docs = map(addKey,[[p,i] for i,p in enumerate([p for i in range(3)])])
print docs

Gives:

[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}, {'a': 1, 'b': 2}]

Which clearly demonstrates the index value passed overwriting the same value which was passed in.

But using copy() to take a copy of the value:

from bson import ObjectId

p = { 'a': 1 }

def addId(obj):
  obj['_id'] = ObjectId()
  return obj

docs = map(addId,[copy(p) for i in range(2)])
print docs

Gives you:

[
  {'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc00')},
  {'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc01')}
]

Or our base demonstration:

p = { 'a': 1 }

def addKey(x):
  x[0]['b'] = x[1]
  return x[0]

docs = map(addKey,[[p,i] for i,p in enumerate([copy(p) for i in range(3)])])
print docs

Returns:

[{'a': 1, 'b': 0}, {'a': 1, 'b': 1}, {'a': 1, 'b': 2}]

So this is basically how python works. If you don't actually deliberately assign to a new value, then all you are doing is returning the same referenced value and simply updating each referenced value in the loop, rather than producing a "new one".

Upvotes: 1

Related Questions