powerPixie
powerPixie

Reputation: 708

bson.errors.InvalidDocument: key '$numberDecimal' must not start with '$' when using json

I have a small json file, with the following lines:

{
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": "2.33"
}

An there is a schema in my db collection, named test_dec. This is what I've used to create the schema:

db.createCollection("test_dec",
{validator: {
    $jsonSchema: {
         bsonType: "object",
         required: ["IdTitulo","IdDirector"],
         properties: {
         IdTitulo: {
                "bsonType": "string",
                "description": "string type, nombre de la pelicula"
            },
         IdDirector: {
                "bsonType": "string",
                "description": "string type, nombre del director"
            },
        IdNumber : {
                "bsonType": "int",
                "description": "number type to test"
            },
        IdDecimal : {
                 "bsonType": "decimal",
                 "description": "decimal type"
                    }
       }
    }}
    })

I've made multiple attempts to insert the data. The problem is in the IdDecimal field value.

Some of the trials, replacing the IdDecimal line by:

 "IdDecimal": 2.33

 "IdDecimal": {"$numberDecimal": "2.33"}

 "IdDecimal": NumberDecimal("2.33")

None of them work. The second one is the formal solution provided by MongoDB manuals (mongodb-extended-json) adn the error is the output I've placed in my question: bson.errors.InvalidDocument: key'$numberDecimal' must not start with '$'.

I am currently using a python to load the json. I've been playing around with this file:

import os,sys
import re
import io
import json
from pymongo import MongoClient
from bson.raw_bson import RawBSONDocument
from bson.json_util import CANONICAL_JSON_OPTIONS,dumps,loads
import bsonjs as bs

#connection
client = MongoClient('localhost',27018,document_class=RawBSONDocument)
db     = client['myDB']
coll   = db['test_dec']   
other_col = db['free']                                                                                        

for fname in os.listdir('/mnt/win/load'):                                                                               
    num = re.findall("\d+", fname)

    if num:

       with io.open(fname, encoding="ISO-8859-1") as f:

            doc_data = loads(dumps(f,json_options=CANONICAL_JSON_OPTIONS))

            print(doc_data) 

            test = '{"idTitulo":"La pelicula","idRelease":2019}'
            raw_bson = bs.loads(test)
            load_raw = RawBSONDocument(raw_bson)

            db.other_col.insert_one(load_raw)


client.close()

I am using a json file. If I try to parse anything like Decimal128('2.33') the output is "ValueError: No JSON object could be decoded", because my json has an invalid format.

The result of

    db.other_col.insert_one(load_raw) 

Is that the content of "test" is inserted. But I cannot use doc_data with RawBSONDocument, because it goes like that. It says:

  TypeError: unpack_from() argument 1 must be string or buffer, not list:

When I manage to parse the json directly to the RawBSONDocument I got all the trash within and the record in database looks like the sample here:

   {
    "_id" : ObjectId("5eb2920a34eea737626667c2"),
    "0" : "{\n",
    "1" : "\t\"IdTitulo\": \"Gremlins\",\n",
    "2" : "\t\"IdDirector\": \"Joe Dante\",\n",
    "3" : "\t\"IdNumber\": 6,\n",
    "4" : "\"IdDate\": {\"$date\": \"2010-06-18T:00.12:00Z\"}\t\n",
    "5" : "}\n"
     }

It seems it is not that simple to load a extended json into MongoDB. The extended version is because I want to use schema validation.

Oleg pointed out that is numberDecimal and not NumberDecimal as I had it before. I've fixed the json file, but nothing changed.

Executed:

with io.open(fname, encoding="ISO-8859-1") as f:
      doc_data = json.load(f)                
      coll.insert(doc_data)

And the json file:

 {
    "IdTitulo": "Gremlins",
    "IdDirector": "Joe Dante",
    "IdNumber": 6,
    "IdDecimal": {"$numberDecimal": "3.45"}
 }

Upvotes: 7

Views: 1129

Answers (4)

Belly Buster
Belly Buster

Reputation: 8814

One more roll of the dice from me. If you are using schema validation as you are, I would recommend defining a class and being explicit with defining each field and how you propose to convert the field to the relevant python datatypes. While your solution is generic, the data structure has to be rigid to match the validation.

IMO this is clearer and you have control over any errors etc within the class.

Just to confirm I ran the schema validation and this works with the supplied validation.

from pymongo import MongoClient
import bson.json_util
import dateutil.parser
import json

class Film:
    def __init__(self, file):
        data = file.read()
        loaded = json.loads(data)
        self.IdTitulo  = loaded.get('IdTitulo')
        self.IdDirector = loaded.get('IdDirector')
        self.IdDecimal = bson.json_util.Decimal128(loaded.get('IdDecimal'))
        self.IdNumber = int(loaded.get('IdNumber'))
        self.IdDateTime = dateutil.parser.parse(loaded.get('IdDateTime'))

    def insert_one(self, collection):
        collection.insert_one(self.__dict__)

client = MongoClient()
mycollection = client.mydatabase.test_dec

with open('c:/temp/1.json', 'r') as jfile:
    film = Film(jfile)
    film.insert_one(mycollection)

gives:

> db.test_dec.findOne()
{
        "_id" : ObjectId("5eba79eabf951a15d32843ae"),
        "IdTitulo" : "Jaws",
        "IdDirector" : "Steven Spielberg",
        "IdDecimal" : NumberDecimal("2.33"),
        "IdNumber" : 8,
        "IdDateTime" : ISODate("2020-05-12T10:08:21Z")
}

>

JSON file used:

{
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": "2.33",
    "IdDateTime": "2020-05-12T11:08:21+0100"
}

Upvotes: 3

Belly Buster
Belly Buster

Reputation: 8814

Could you not just use bson.decimal128.Decimal128? Ot am I missing something?

from pymongo import MongoClient
from bson.decimal128 import Decimal128

db = MongoClient()['mydatabase']

data = {
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": "2.33"
}

data['IdDecimal'] = Decimal128(data['IdDecimal'])
db.other_col.insert_one(data)

Upvotes: 0

powerPixie
powerPixie

Reputation: 708

Finally, I've got the solution and it is using RawBSONDocument.

First the json file:

{
    "IdTitulo": "Dead Snow",
    "IdDirector": "Tommy Wirkola",
    "IdNumber": 11,
    "IdDecimal": {"$numberDecimal": "2.22"}
}

& the validation schema file:

db.createCollection("test_dec",
  {validator: {
     $jsonSchema: {
        bsonType: "object",
        required: ["IdTitulo","IdDirector"],
        properties: {
            IdTitulo: {
                "bsonType": "string",
                "description": "string type, nombre de la pelicula"
                },
            IdDirector: {
                "bsonType": "string",
                "description": "string type, nombre del director"
                },
            IdNumber : {
                "bsonType": "int",
                "description": "number type to test"
               },
            IdDecimal : {
                 "bsonType": "decimal",
                 "description": "decimal type"
                }
             }
          }}
   })

So, the collection in this case is "test_dec".

And the python script that opens the file ".json", reads it and parses it to be imported into MongoDB.

import json
from bson.raw_bson import RawBSONDocument
from pymongo import MongoClient
import bsonjs

#connection
client = MongoClient('localhost',27018)
db     = client['movieDB']
coll   = db['test_dec']

#open an read file
with open('1.json', 'r') as jfile:
    data = jfile.read()

    loaded = json.loads(data)
    dumped = json.dumps(loaded, indent=4)
    bson_bytes = bsonjs.loads(dumped)

    coll.insert_one(RawBSONDocument(bson_bytes))


client.close()

The inserted document:

{
    "_id" : ObjectId("5eb971ec6fbab859dfae8a6f"),
    "IdTitulo" : "Dead Snow",
    "IdDirector" : "Toomy Wirkola",
    "IdDecimal" : NumberDecimal("2.22"),
    "IdNumber" : 11
 }

I don't know how it flipped the fields IdDecimal and IdNumber, but it passes the validation and I am really happy.

I tried a document with 'hello' instead of a number in NumberDecimal and the insertion resulted in:

 {
    "_id" : ObjectId("5eb973b76fbab859dfae8ecd"),
    "IdTitulo" : "Shining",
    "IdDirector" : "Stanley Kubrick",
    "IdDecimal" : NumberDecimal("NaN"),
    "IdNumber" : 19
  }

Thanks to all that tried to help. Specially Oleg!!! Thank you for being so patient.

Upvotes: 0

D. SM
D. SM

Reputation: 14490

JSON with type information is called Extended JSON. Following the examples, construct extended json for your data:

ext_json = '''
{
    "IdTitulo": "Jaws",
    "IdDirector": "Steven Spielberg",
    "IdNumber": 8,
    "IdDecimal": {"$numberDecimal":"2.33"}
}
'''

In Python, use json_util to load extended json into a Python dictionary:

from bson.json_util import loads

doc = loads(ext_json)

print(doc)

# {u'IdTitulo': u'Jaws', u'IdDirector': u'Steven Spielberg', u'IdDecimal': Decimal128('2.33'), u'IdNumber': 8}

The result of this load is sometimes referred to as a "BSON document" but it is not BSON, which is binary. "BSON" in this context really means that some values are not of python standard library types. The "document" part basically means the object is a dictionary.

You will notice that IdNumber is of a non-standard library type:

print type(doc['IdDecimal'])

# <class 'bson.decimal128.Decimal128'>

To insert this dictionary into MongoDB, follow pymongo tutorial:

from pymongo import MongoClient
client = MongoClient('localhost', 14420)

db = client.test_database

collection = db.test_collection

collection.insert_one(doc)

print(doc)

Upvotes: 0

Related Questions