Reputation: 666
I would like to deserialize JSON of which I predefined the schema. Here is a typical JSON file I deal with.
{'op': 'mcm',
'id': 1,
'clk': 'AKjT4QEAl5q/AQCW7rIB',
'pt': 1563999965598,
'mc': [{'id': '1.160679253',
'rc': [{'atl': [[1.18, 88.5],
[1.17, 152.86],
[1.16, 175.96],
[1.14, 93.3],
[1.08, 28.08],
[1.07, 8.84],
[1.02, 129.74]],
'id': 1}]}]}
for which I would like a schema like that:
{'op': String,
'id': Integer,
'clk': String,
'pt': Integer,
'mc': [{'id': String,
'rc': [{'atl': Array(Decimal),
'id': Integer}]}]}
I know it is possible to do that with PySpark but I am looking for a lighter solution (something on the top of the json
packages for example).
Here is what I already tried so far:
JSONDecoder
(https://docs.python.org/3/library/json.html#json.JSONDecoder) with custom parse_float
, parse_int
, parse_constant
function: those functions only take the string to be parsed as an argument and I would have to treat '1.160679253'
(just after pt
) and '1.18'
(just after atl
) the same way while I want '1.160679253'
to remain a string and '1.18'
to be cast as decimal.Thanks in advance for your help
Upvotes: 0
Views: 499
Reputation: 578
Your first approach is the most lightweight one as it requires nothing but the standard library - just use a custom function based on json
package tailored to what you need. As for the float->decimal conversion and precision loss, json.loads()
has parse_float
parameter to force floating number parsing as Decimals straight away:
>>> import decimal
>>> json.loads('1.1', parse_float=decimal.Decimal)
Decimal('1.1')
As for the ID field, which will be parsed to Decimal
as well thanks to its unique float-similar format - you can just convert it back to string via str()
with no information loss as a special case.
Upvotes: 2