Robin Nicole
Robin Nicole

Reputation: 666

Schemas to deserialize JSON string in python

I would like to deserialize JSON of which I predefined the schema. Here is a typical JSON file I deal with.

{'op': 'mcm',
 'id': 1,
 'clk': 'AKjT4QEAl5q/AQCW7rIB',
 'pt': 1563999965598,
 'mc': [{'id': '1.160679253',
   'rc': [{'atl': [[1.18, 88.5],
      [1.17, 152.86],
      [1.16, 175.96],
      [1.14, 93.3],
      [1.08, 28.08],
      [1.07, 8.84],
      [1.02, 129.74]],
     'id': 1}]}]}

for which I would like a schema like that:

{'op': String,
 'id': Integer,
 'clk': String,
 'pt': Integer,
 'mc': [{'id': String,
   'rc': [{'atl': Array(Decimal),
     'id': Integer}]}]}

I know it is possible to do that with PySpark but I am looking for a lighter solution (something on the top of the json packages for example).

Here is what I already tried so far:

Thanks in advance for your help

Upvotes: 0

Views: 499

Answers (1)

Zaroth
Zaroth

Reputation: 578

Your first approach is the most lightweight one as it requires nothing but the standard library - just use a custom function based on json package tailored to what you need. As for the float->decimal conversion and precision loss, json.loads() has parse_float parameter to force floating number parsing as Decimals straight away:

>>> import decimal
>>> json.loads('1.1', parse_float=decimal.Decimal)
Decimal('1.1')

As for the ID field, which will be parsed to Decimal as well thanks to its unique float-similar format - you can just convert it back to string via str() with no information loss as a special case.

Upvotes: 2

Related Questions