Reputation: 175
Here is my json file looks like:
{"File": "xyz.csv", "Line": "0", "Classes": [{"Name": "ABC", "Score": 0.9842}, {"Name": "DEF", "Score": 0.0128}, {"Name": "GHI", "Score": 0.003}]}
{"File": "xyz.csv", "Line": "1", "Classes": [{"Name": "ABC2", "Score": 0.9999}, {"Name": "DEF2", "Score": 0.1111}, {"Name": "GHI2", "Score": 0.5666}]}
pred_df = pd.read_json('filename.json',lines=True)
When I tried to use json_normalize the last column "Classes", it give me an error: string indices must be integers
Class = json_normalize(data = pred_df,
record_path= pred_df['Classes'],
meta =['Name','Score'])
Pls let me know what I'm missing here....thanks!
Upvotes: 2
Views: 505
Reputation: 402293
Do this in two steps. The first loads your JSON, the second then flattens your "Classes" column and broadcasts the rest of your data to it using np.repeat
.
df = pd.read_json('filename.json', lines=True)
classes = df.pop('Classes')
pd.concat([
pd.DataFrame(classes.sum()),
pd.DataFrame(df.values.repeat(classes.str.len(), axis=0), columns=[*df])
], axis=1)
Name Score File Line
0 ABC 0.9842 xyz.csv 0
1 DEF 0.0128 xyz.csv 0
2 GHI 0.0030 xyz.csv 0
3 ABC2 0.9999 xyz.csv 1
4 DEF2 0.1111 xyz.csv 1
5 GHI2 0.5666 xyz.csv 1
Replace classes.sum()
with itertools.chain.from_iterable(classes)
if performance is important.
Upvotes: 2