Reputation: 1745
I have an example output JSON in dictionary form:
d = {'httpStatus': 200,
'httpStatusMessage': 'Success',
'timestamp': '2022-02-16T19:06:00.1924563Z',
'response': {'header': {'companyId': '1000',
'companyName': 'Bobs Groceries'},
'body': {'dataSources': [{'dataSource': 3,
'employees': [{'employeeId': '25',
'employeeReference': '1500',
'activeFlag': True,
'createdDate': '2022-01-27T15:20:38.2700000Z',
'lastUpdate': '2022-01-27T15:20:38.3500000Z',
'firstName': 'Bob',
'lastName': 'Brantley'},
{'employeeId': '28',
'employeeReference': '1505',
'activeFlag': True,
'createdDate': '2022-01-27T15:20:24.2400000Z',
'lastUpdate': '2022-01-27T15:20:24.2400000Z',
'firstName': 'Jeffrey',
'lastName': 'Johnson'}]}]}}}
I would like to recursively parse this JSON and create a master dictionary with the list elements exploded into new dictionary keys with the values as lists. So essentially I want to convert lists of dictionaries into dictionaries of lists.
For the above example I desire the output to be something like:
output = {'httpStatus': 200,
'httpStatusMessage': 'Success',
'timestamp': '2022-02-16T19:06:00.1924563Z',
'response_header_companyId': '1000',
'companyName': 'Bobs Groceries',
'dataSource': 3,
'employeeId': ['25','28'],
'employeeReference': ['1500','1505'],
'activeFlag': [True, True],
'createdDate': ['2022-01-27T15:20:38.2700000Z','2022-01-27T15:20:24.2400000Z'],
'lastUpdate': ['2022-01-27T15:20:38.3500000Z','2022-01-27T15:20:24.2400000Z'],
'firstName': ['Bob','Jeffrey'],
'lastName': ['Brantley','Johnson']}
I can come somewhat close using this function:
def flatten_nested_json(d: dict)-> dict:
"""
Accepts Dictionary argument which can have nested dictionaries/lists within.
Output will be a flat dictionary that can be converted to a pandas dataframe
"""
out = {}
def flatten(x, name: str=''):
# handles dictionaries with elements
if type(x) is dict:
for k in x:
flatten(x[k], name + k + "_")
# handles lists with elements
elif type(x) is list:
for j in x:
if type(j) is dict:
for y, z in j.items():
out[y] = z
else:
out[j] = x
else:
out[name[:-1]] = x
flatten(d)
return out
However, the list of dictionaries with employeeId
, etc. remains as a dictionary. I need to find a way to add the recursion for that part so I can explode the dictionaries into new keys with the values as lists. Basically I want to combine all of those dictionaries with similar keys in a list into a single dictionary with list elements containing the values of each dictionary.
A dynamic approach without pandas would be desired. Thanks!
Upvotes: 1
Views: 370
Reputation: 1928
jsonpath-ng
can parse even such a nested json object very easily. It can be installed by the following command:
pip install --upgrade jsonpath-ng
from collections import defaultdict
import jsonpath_ng as jp
def flatten_nested_json(d: dict)-> dict:
expr = jp.parse('$..*')
dd = defaultdict(list)
for m in expr.find(d):
if not isinstance(m.value, (dict, list)):
dd[str(m.path)].append(m.value)
for k, v in dd.items():
if len(v) == 1:
dd[k] = v[0]
return dict(dd)
result = flatten_nested_json(d)
The key point is the following line:
expr = jp.parse('$..*')
By the above expression, you can search target elements at any level of the json object. README.rst tells you what each syntax stands for:
Syntax | Meaning |
---|---|
$ |
The root object |
jsonpath1 .. jsonpath2 |
All nodes matched by jsonpath2 that descend from any node matching jsonpath1 |
* |
any field |
{'httpStatus': 200,
'httpStatusMessage': 'Success',
'timestamp': '2022-02-16T19:06:00.1924563Z',
'companyId': '1000',
'companyName': 'Bobs Groceries',
'dataSource': 3,
'employeeId': ['25', '28'],
'employeeReference': ['1500', '1505'],
'activeFlag': [True, True],
'createdDate': ['2022-01-27T15:20:38.2700000Z',
'2022-01-27T15:20:24.2400000Z'],
'lastUpdate': ['2022-01-27T15:20:38.3500000Z',
'2022-01-27T15:20:24.2400000Z'],
'firstName': ['Bob', 'Jeffrey'],
'lastName': ['Brantley', 'Johnson']}
If you want a full path to each value, try the following code:
from collections import defaultdict
import jsonpath_ng as jp
import re
auto_id_field = 'json_path'
jp.jsonpath.auto_id_field = auto_id_field
def flatten_nested_json(d: dict)-> dict:
expr = jp.parse(f'$..*.{auto_id_field}')
pat = re.compile(r'(?:\.\[\d+]\.|\.)')
dd = defaultdict(list)
for m in expr.find(d):
if not isinstance(m.datum.value, (dict, list)):
dd[pat.sub('_', m.value)].append(m.datum.value)
for k, v in dd.items():
if len(v) == 1:
dd[k] = v[0]
return dict(dd)
result = flatten_nested_json(d)
{'httpStatus': 200,
'httpStatusMessage': 'Success',
'timestamp': '2022-02-16T19:06:00.1924563Z',
'response_header_companyId': '1000',
'response_header_companyName': 'Bobs Groceries',
'response_body_dataSources_dataSource': 3,
'response_body_dataSources_employees_employeeId': ['25', '28'],
'response_body_dataSources_employees_employeeReference': ['1500', '1505'],
'response_body_dataSources_employees_activeFlag': [True, True],
'response_body_dataSources_employees_createdDate': ['2022-01-27T15:20:38.2700000Z',
'2022-01-27T15:20:24.2400000Z'],
'response_body_dataSources_employees_lastUpdate': ['2022-01-27T15:20:38.3500000Z',
'2022-01-27T15:20:24.2400000Z'],
'response_body_dataSources_employees_firstName': ['Bob', 'Jeffrey'],
'response_body_dataSources_employees_lastName': ['Brantley', 'Johnson']}
Upvotes: 1