user835199
user835199

Reputation: 335

Why does db.insert(dict) add _id key to the dict object while using pymongo

I am using pymongo in the following way:

from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(a)
print a

This prints

{'_id': ObjectId('53ad61aa06998f07cee687c3'), 'key1': 'value1'}

on the console. I understand that _id is added to the mongo document. But why is this added to my python dictionary too? I did not intend to do this. I am wondering what is the purpose of this? I could be using this dictionary for other purposes to and the dictionary gets updated as a side effect of inserting it into the document? If I have to, say, serialise this dictionary into a json object, I will get a

ObjectId('53ad610106998f0772adc6cb') is not JSON serializable

error. Should not the insert function keep the value of the dictionary same while inserting the document in the db.

Upvotes: 15

Views: 14118

Answers (5)

Utkonos
Utkonos

Reputation: 795

This behavior can be circumvented by using the copy module. This will pass a copy of the dictionary to pymongo leaving the original intact. Based on the code snippet in your example, one should modifiy it like so:

import copy
from pymongo import *
a = {'key1':'value1'}
db1.collection1.insert(copy.copy(a))
print a

Upvotes: 0

Sabuhi Shukurov
Sabuhi Shukurov

Reputation: 1920

as @BorrajaX answered already want to add some more. _id is a unique identifier, when a document is inserted to the collection it generates with some random numbers. Either you can set your own id or you can use what MongoDB has created for you.

As documentation mentions about this.

For your case, you can simply ignore this key by using del keyword del a["_id"].

or

if you need _id for further operations you can use dumps from bson module.

import json
from bson.json_util import loads as bson_loads, dumps as bson_dumps 

a["_id"]=json.loads(bson_dumps(a["_id"]))

or

before inserting document you can add your custom _id you won't need serialize your dictionary

a["_id"] = "some_id"

db1.collection1.insert(a)

Upvotes: 0

Savir
Savir

Reputation: 18418

As many other database systems out there, Pymongo will add the unique identifier necessary to retrieve the data from the database as soon as it's inserted (what would happen if you insert two dictionaries with the same content {'key1':'value1'} in the database? How would you distinguish that you want this one and not that one?)

This is explained in the Pymongo docs:

When a document is inserted a special key, "_id", is automatically added if the document doesn’t already contain an "_id" key. The value of "_id" must be unique across the collection.

If you want to change this behavior, you could give the object an _id attribute before inserting. In my opinion, this is a bad idea. It would easily lead to collisions and you would lose juicy information that is stored in a "real" ObjectId, such as creation time, which is great for sorting and things like that.

>>> a = {'_id': 'hello', 'key1':'value1'}
>>> collection.insert(a)
'hello'
>>> collection.find_one({'_id': 'hello'})
{u'key1': u'value1', u'_id': u'hello'}

Or if your problem comes when serializing to Json, you can use the utilities in the BSON module:

>>> a = {'key1':'value1'}
>>> collection.insert(a)
ObjectId('53ad6d59867b2d0d15746b34')
>>> from bson import json_util
>>> json_util.dumps(collection.find_one({'_id': ObjectId('53ad6d59867b2d0d15746b34')}))
'{"key1": "value1", "_id": {"$oid": "53ad6d59867b2d0d15746b34"}}'

(you can verify that this is valid json in pages like jsonlint.com)

Upvotes: 1

MBarsi
MBarsi

Reputation: 2457

_id act as a primary key for documents, unlike SQL databases, its required in mongodb.

to make _id serializable, you have 2 options:

  1. set _id to a JSON serializable datatype in your documents before inserting them (e.g. int, str) but keep in mind that it must be unique per document.

  2. use a custom BSON serializion encoder/decoder classes:

    from bson.json_util import default as bson_default
    from bson.json_util import object_hook as bson_object_hook
    
    class BSONJSONEncoder(json.JSONEncoder):
        def default(self, o):
            return bson_default(o)
    
    
    class BSONJSONDecoder(json.JSONDecoder):
        def __init__(self, **kwrgs):
            JSONDecoder.__init__(self, object_hook=bson_object_hook)
    

Upvotes: 0

sundar nataraj
sundar nataraj

Reputation: 8692

Clearly the docs answer your question

MongoDB stores documents on disk in the BSON serialization format. BSON is a binary representation of JSON documents, though it contains more data types than JSON.

The value of a field can be any of the BSON data types, including other documents, arrays, and arrays of documents. The following document contains values of varying types:

var mydoc = {
               _id: ObjectId("5099803df3f4948bd2f98391"),
               name: { first: "Alan", last: "Turing" },
               birth: new Date('Jun 23, 1912'),
               death: new Date('Jun 07, 1954'),
               contribs: [ "Turing machine", "Turing test", "Turingery" ],
               views : NumberLong(1250000)
            }

to know more about BSON

Upvotes: -2

Related Questions