ga1996
ga1996

Reputation: 125

How to normalize json from pandas dataframe

I have this json file sample:

{"T1": [{"time":"2021-03-25T13:31:58+01:00","data":-1}],"T2": [{"time":"2021-03-25T13:17:02+01:00","data":[23,35,54,44,55,44,33,44,55,66,88,65]}]}

and I am loading it like this:

import os 
import glob
import pandas  as pd
import json

path_to_json = 'dir/dir/data.json' 

df = pd.read_json(path_to_json, lines=True)

df

and it looks like this:

enter image description here

When I try to call json_normalize like pd.json_normalize(df) it doesn't work. Any suggestions on how to normalize this json dataframe into T1_time, T1_data and so on would be highly appreciated!

Upvotes: 1

Views: 255

Answers (2)

furas
furas

Reputation: 142631

I found it is simpler to get first dictionary from list, convert to Series and concatenate

df1 = df['T1'].str[0].apply(pd.Series).add_prefix('T1.')
df2 = df['T2'].str[0].apply(pd.Series).add_prefix('T2.')

new_df = pd.concat([df1, df2], axis=1)

And the same using loop with column's names

dfs = [df[col].str[0].apply(pd.Series).add_prefix(f'{col}.') for col in df.columns]

new_df = pd.concat(dfs, axis=1)

EDIT:

For more dictionares in list you can use .explode() to move values to separated rows.

df1 = df['T1'].explode().apply(pd.Series).add_prefix('T1.')
df2 = df['T2'].explode().apply(pd.Series).add_prefix('T2.')

new_df = pd.concat([df1, df2], axis=1)
dfs = [df[col].explode().apply(pd.Series).add_prefix(f'{col}.') for col in df.columns]

new_df = pd.concat(dfs, axis=1)

But this will work if you will have the same number of dictionaries in T1 and T2. If one of them will have less dictionares then it may repeate dictionary in some rows.


Minimal working code.

I use io.StrigIO() only to simulate file in memory - so everyone can simply copy and run it

import pandas as pd
import io

data = '{"T1": [{"time":"2021-03-25T13:31:58+01:00","data":-1},{"time":"1990-03-25T13:31:58+01:00","data":999}], "T2": [{"time":"2021-03-25T13:17:02+01:00","data":[23,35,54,44,55,44,33,44,55,66,88,65]}, {"time":"2021-03-25T13:17:02+01:00","data":[23]}]}'
df = pd.read_json(io.StringIO(data), lines=True)
print(df)

# ----------------

df1 = df['T1'].explode().apply(pd.Series).add_prefix('T1.')
df2 = df['T2'].explode().apply(pd.Series).add_prefix('T2.')
new_df = pd.concat([df1, df2], axis=1)
print(new_df.to_string())

# ----------------

dfs = [df[col].explode().apply(pd.Series).add_prefix(f'{col}.') for col in df.columns]
new_df = pd.concat(dfs, axis=1)
print(new_df.to_string())

# ----------------

df = df.explode(df.columns.tolist())
dfs = [df[col].apply(pd.Series).add_prefix(f'{col}.') for col in df.columns]
new_df = pd.concat(dfs, axis=1)
print(new_df.to_string())

Result:

                                                  T1                                                 T2
0  [{'time': '2021-03-25T13:31:58+01:00', 'data':...  [{'time': '2021-03-25T13:17:02+01:00', 'data':...

                     T1.time  T1.data                    T2.time                                           T2.data
0  2021-03-25T13:31:58+01:00       -1  2021-03-25T13:17:02+01:00  [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]
0  1990-03-25T13:31:58+01:00      999  2021-03-25T13:17:02+01:00                                              [23]

                     T1.time  T1.data                    T2.time                                           T2.data
0  2021-03-25T13:31:58+01:00       -1  2021-03-25T13:17:02+01:00  [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]
0  1990-03-25T13:31:58+01:00      999  2021-03-25T13:17:02+01:00                                              [23]

Upvotes: 1

Bushmaster
Bushmaster

Reputation: 4608

can you try this:

import os 
import glob
import pandas  as pd
import json

path_to_json = 'dir/dir/data.json' 

df = pd.read_json(path_to_json, lines=True)
df=df.explode('T1').explode('T2')
df=df.join(pd.json_normalize(df.pop('T1')))
df=pd.concat([df,pd.json_normalize(df.pop('T2'))])
print(df)
'''
    time                        data
0   2021-03-25T13:31:58+01:00   -1
0   2021-03-25T13:17:02+01:00   [23, 35, 54, 44, 55, 44, 33, 44, 55, 66, 88, 65]

'''

Upvotes: 1

Related Questions