Reputation:
I am trying to unpack nested JSON in the following pandas dataframe:
id info
0 0 [{u'a': u'good', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}]
1 1 [{u'a': u'bad', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}]
2 2 [{u'a': u'good', u'b': u'type1'}, {u'a': u'good', u'b': u'type2'}]
My expected outcome is:
id type1 type2
0 0 good bad
1 1 bad bad
2 2 good good
I've been looking at other solutions including json_normalize
but it does not work for me unfortunately. Should I treat the JSON as a string to get what I want? Or is there a more straight forward way to do this?
Upvotes: 9
Views: 11588
Reputation: 29711
json_normalize
to handle a list
of dictionaries and break individual dicts into separate series after setting the common path, which is info here. Then, unstack
+ apply series which gets appended downwards for that level.from pandas.io.json import json_normalize
df_info = json_normalize(df.to_dict('list'), ['info']).unstack().apply(pd.Series)
df_info
DF
with an optional aggfunc
to handle duplicated index axis:DF = df_info.pivot_table(index=df_info.index.get_level_values(1), columns=['b'],
values=['a'], aggfunc=' '.join)
DF
pd.concat([df[['ID']], DF.xs('a', axis=1).rename_axis(None, 1)], axis=1)
Starting DF
used:
df = pd.DataFrame(dict(ID=[0,1,2], info=[[{u'a': u'good', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}],
[{u'a': u'bad', u'b': u'type1'}, {u'a': u'bad', u'b': u'type2'}],
[{u'a': u'good', u'b': u'type1'}, {u'a': u'good', u'b': u'type2'}]]))
Upvotes: 11