Flatten JSON data using pandas json_normalize

Question

Here is my json file looks like:

{"File": "xyz.csv", "Line": "0", "Classes": [{"Name": "ABC", "Score": 0.9842}, {"Name": "DEF", "Score": 0.0128}, {"Name": "GHI", "Score": 0.003}]}
{"File": "xyz.csv", "Line": "1", "Classes": [{"Name": "ABC2", "Score": 0.9999}, {"Name": "DEF2", "Score": 0.1111}, {"Name": "GHI2", "Score": 0.5666}]}

pred_df = pd.read_json('filename.json',lines=True)

When I tried to use json_normalize the last column "Classes", it give me an error: string indices must be integers

Class = json_normalize(data = pred_df,
                  record_path= pred_df['Classes'],
                  meta =['Name','Score'])

Pls let me know what I'm missing here....thanks!

cs95 · Accepted Answer

Do this in two steps. The first loads your JSON, the second then flattens your "Classes" column and broadcasts the rest of your data to it using np.repeat.

df = pd.read_json('filename.json', lines=True)

classes = df.pop('Classes')
pd.concat([
    pd.DataFrame(classes.sum()), 
    pd.DataFrame(df.values.repeat(classes.str.len(), axis=0), columns=[*df])
], axis=1)

   Name   Score     File Line
0   ABC  0.9842  xyz.csv    0
1   DEF  0.0128  xyz.csv    0
2   GHI  0.0030  xyz.csv    0
3  ABC2  0.9999  xyz.csv    1
4  DEF2  0.1111  xyz.csv    1
5  GHI2  0.5666  xyz.csv    1

Replace classes.sum() with itertools.chain.from_iterable(classes) if performance is important.

Flatten JSON data using pandas json_normalize

Answers (1)

Related Questions