Nukedd
Nukedd

Reputation: 28

Python Pandas Flatten nested JSON

I'm new to python & pandas and it took me awhile to get the results I wanted using below. Basically, I am trying to flatten a nested JSON. While json_normalize works, there are columns where it contains a list of objects(key/value). I want to breakdown that list and add them as separate columns. Below sample code I wrote worked out fine but was wondering if this could be simplified or improved further or an alternative? Most of the articles I've found relies on actually naming the columns and such (codes that I cant pick up on just yet), but I would rather have this as a function and name the columns dynamically. The output csv will be used for Power BI.

with open('json.json', 'r') as json_file: jsondata = json.load(json_file)

df = pd.json_normalize(jsondata['results'], errors='ignore')

y = df.columns[df.applymap(type).eq(list).any()]

for x in y:

    df = df.explode(x).reset_index(drop=True)

    df_exploded = pd.json_normalize(df.pop(x))

    for i in df_exploded.columns:
        
        df_exploded = df_exploded.rename(columns={i:x + '_' + i})

    df = df.join(df_exploded)

df.to_csv('json.csv')

Sample JSON Format (Not including the large JSON I was working on):

data = {
        'results': [
                        {
                            'recordType': 'recordType',
                            'id': 'id',
                            'values': 
                              {
                                'orderType': [{'value': 'value', 'text': 'text'}],
                                'type': [{'value': 'value', 'text': 'text'}],
                                'entity': [{'value': 'value', 'text': 'text'}],
                                'trandate': 'trandate'
                              }
                        }
                   ]
       }

The values part when json_normalized, doesn't get flatten and required explode and joined.

Upvotes: 1

Views: 172

Answers (1)

Bushmaster
Bushmaster

Reputation: 4608

You can use something like this:

df = pd.json_normalize(jsondata['results'],meta=['recordType','id'])[['recordType','id','values.trandate']]

record_paths = [['values','orderType'],['values','type'],['values','entity']]
for i in record_paths:
    df = pd.concat([df,pd.json_normalize(jsondata['results'],record_path=i,record_prefix=i[1])],axis=1)

df.to_csv('json.csv')

Or (much faster):

df = pd.DataFrame({'recordType':[i['recordType'] for i in jsondata['results']],
                 'id':[i['id'] for i in jsondata['results']],
                 'values.trandate':[i['values']['trandate'] for i in jsondata['results']],
                 'orderTypevalue':[i['values']['orderType'][0]['value'] for i in jsondata['results']],
                 'orderTypetext':[i['values']['orderType'][0]['text'] for i in jsondata['results']],
                 'typevalue':[i['values']['type'][0]['value'] for i in jsondata['results']],
                 'typetext':[i['values']['type'][0]['text'] for i in jsondata['results']],
                 'entityvalue':[i['values']['entity'][0]['value'] for i in jsondata['results']],
                 'entitytext':[i['values']['entity'][0]['text'] for i in jsondata['results']]})

df.to_csv('json.csv')

Output:

|    | recordType   | id   | values.trandate   | orderTypevalue   | orderTypetext   | typevalue   | typetext   | entityvalue   | entitytext   |
|---:|:-------------|:-----|:------------------|:-----------------|:----------------|:------------|:-----------|:--------------|:-------------|
|  0 | recordType   | id   | trandate          | value            | text            | value       | text       | value         | text         |

Upvotes: 1

Related Questions