Akash Kumar
Akash Kumar

Reputation: 742

How to read nested objects in a json file as dataframe?

I have a .json file with content:

{"success":true,"code":"SUCCESS","data":{"from":1514745000000,"to":1522175400000,"transactionData":[{"name":"Recharge & bill payments","paymentInstruments":[{"type":"TOTAL","count":4200,"amount":1845307.4673655091}]},{"name":"Peer-to-peer payments","paymentInstruments":[{"type":"TOTAL","count":1871,"amount":1.2138655299749982E7}]},{"name":"Merchant payments","paymentInstruments":[{"type":"TOTAL","count":298,"amount":452507.168646613}]},{"name":"Financial Services","paymentInstruments":[{"type":"TOTAL","count":33,"amount":10601.419933464953}]},{"name":"Others","paymentInstruments":[{"type":"TOTAL","count":256,"amount":184689.8662902223}]}]},"responseTimestamp":1630501487199}

I want to convert it into a pandas data frame. But when I apply:

a = pd.read_json('/1.json')

I get output like this: enter image description here

How can I get it in the correct pandas DataFrame format?

Upvotes: 0

Views: 114

Answers (1)

Himanshu Poddar
Himanshu Poddar

Reputation: 7799

Since you want to read data key in your dictionary. You can load the json as dictionary in memory and then use pandas to convert the same to a dataframe.

As discussed in comments : This is the expected output

enter image description here

Solution :

Here we are reading the json data first, then converting the data >>> transaction key to a pandas dataframe.

The above gives us a dataframe containing list values using pandas.DataFrame.explode for paymentInstruments. We need to unravel it such that we transform each element of a list-like to a row, replicating index values.

Once the values in lists are replicated we can convert the dictionries obtained to columns using pandas.Series.apply or pandas.json_normalize

Reading Data

import json

f = open(r'/1.json')
data = json.load(f)
                           
df = pd.DataFrame.from_dict(data['data']['transactionData'])

Pre-processing and generating Data

df = df.explode('paymentInstruments')
df = pd.concat([df, df.pop('paymentInstruments').apply(pd.Series)], axis = 1)

Output

This gives us the expected output

enter image description here

Alternatively :

Here is json_normalize solution

df = df.explode('paymentInstruments')
df = pd.concat([df, pd.json_normalize(df.pop('paymentInstruments'))], axis = 1)

This gives us :

enter image description here

Upvotes: 1

Related Questions