Schemas to deserialize JSON string in python

Question

I would like to deserialize JSON of which I predefined the schema. Here is a typical JSON file I deal with.

{'op': 'mcm',
 'id': 1,
 'clk': 'AKjT4QEAl5q/AQCW7rIB',
 'pt': 1563999965598,
 'mc': [{'id': '1.160679253',
   'rc': [{'atl': [[1.18, 88.5],
      [1.17, 152.86],
      [1.16, 175.96],
      [1.14, 93.3],
      [1.08, 28.08],
      [1.07, 8.84],
      [1.02, 129.74]],
     'id': 1}]}]}

for which I would like a schema like that:

{'op': String,
 'id': Integer,
 'clk': String,
 'pt': Integer,
 'mc': [{'id': String,
   'rc': [{'atl': Array(Decimal),
     'id': Integer}]}]}

I know it is possible to do that with PySpark but I am looking for a lighter solution (something on the top of the json packages for example).

Here is what I already tried so far:

deserializing the JSON file and having a custom function to set up the type of each of the elements of the dictionary: I am afraid that by converting from string to float and then from float to Decimal I would get rounding errors.
Use a custom JSONDecoder (https://docs.python.org/3/library/json.html#json.JSONDecoder) with custom parse_float, parse_int, parse_constant function: those functions only take the string to be parsed as an argument and I would have to treat '1.160679253' (just after pt) and '1.18' (just after atl) the same way while I want '1.160679253' to remain a string and '1.18' to be cast as decimal.

Thanks in advance for your help

Zaroth · Accepted Answer

Your first approach is the most lightweight one as it requires nothing but the standard library - just use a custom function based on json package tailored to what you need. As for the float->decimal conversion and precision loss, json.loads() has parse_float parameter to force floating number parsing as Decimals straight away:

>>> import decimal
>>> json.loads('1.1', parse_float=decimal.Decimal)
Decimal('1.1')

As for the ID field, which will be parsed to Decimal as well thanks to its unique float-similar format - you can just convert it back to string via str() with no information loss as a special case.

Schemas to deserialize JSON string in python

Answers (1)

Related Questions