rpb
rpb

Reputation: 3299

Converting nested list of dictionary to dataframe using json_normalize in Pandas

The objective is to have a single df given a nested list of dict as below

n=dict(de={'name':'a','status':'aa'},th={'name':'b','status':'bb'},al={'name':'c','status':'cc'})
NESTED_DICT=[dict(CH=dict(bm=[n,n], cm=[n,n], dm=[n,n]),PL=dict(bm=[n,n], cm=[n,n], dm=[n,n])),dict()]
data=[NESTED_DICT for _ in range(3)]

While this objective can be achieve easily using for loop as below

all_data=[]

for xdata in data:
    for con_type in ['CH','PL']:
        for condi in [ 'bm','cm','dm']:
            ndata=xdata[0][con_type][condi]
            df = pd.concat([pd.DataFrame.from_dict(x, orient='index') for x in ndata])
            all_data.append(df)


df= pd.concat(all_data)

which produced

   name status
de    a     aa
th    b     bb
al    c     cc
de    a     aa
th    b     bb
..  ...    ...
th    b     bb
al    c     cc
de    a     aa
th    b     bb
al    c     cc
[108 rows x 2 columns]

Im looking for more compact and efficient of doing it.

I have come across with json_normalize for Nested Data which is super compact.

Based on example, I have the impression this can be achieved by something like

# For single subject

data_nested=data[0][0]
df=pd.json_normalize(data_nested,meta=['CH','PL'])

The output is something like

                                            CH.bm  ...                                              PL.dm
0  [{'de': {'name': 'a', 'status': 'aa'}, 'th': {...  ...  [{'de': {'name': 'a', 'status': 'aa'}, 'th': {...

Which is expected.

What parameter should be modified to get something like the nested for loop above?

Upvotes: 1

Views: 138

Answers (1)

Code Different
Code Different

Reputation: 93191

No. json_normalize works better if your top level is a dict -- it's an array in this case. And the deeply nested data structure also makes it very challenging for json_normalize.

You can make the loop more readable with list comprehension:

all_data = [
    pd.DataFrame.from_dict(x, orient='index')
    for xdata in data
    for con_type in ['CH', 'PL']
    for condi in ['bm', 'cm', 'dm']
    for x in xdata[0][con_type][condi]
]
df = pd.concat(all_data)

Upvotes: 1

Related Questions