Reputation: 443
In Python, I'm working with a dataset to determine how reactions of users are related to the post reach. My dataset is structured in this way, the Reactions column being nested:
PostID Reach Reaction
01 787767 {"like":49852,"wow":8017,"haha":3200,"anger":3}
02 973183 {"like":57911,"wow":3013,"haha":8017,"anger":15}
03 ... ...
I want to restructure the data and create separate reaction columns so the dataframe would be looking like that:
PostID Reach like wow haha anger
01 787767 49852 8017 3200 3
02 973183 57911 3013 8017 15
03 ... ...
Upvotes: 0
Views: 1699
Reputation: 57085
Convert the dictionaries to Pandas Series
:
pd.concat([df.iloc[:,:2], df.Reaction.apply(pd.Series)],axis=1)
# PostID Reach anger haha like wow
#0 1 787767 3 3200 49852 8017
#1 2 97318 15 8017 57911 3013
Upvotes: 5
Reputation: 402852
Lots of ways to do this, assuming you have a column of JSON data. One simple way is apply
ing a json.loads
operation, converting the string to dicts, and then using DataFrame.from_records
, or json_normalize
to load it in.
v = pd.DataFrame.from_records(df.Reaction.apply(pd.json.loads))
Or,
v = pd.io.json.json_normalize(df.Reaction.apply(pd.json.loads).tolist())
Finally, concat
enate the result.
pd.concat([df.drop('Reaction', 1), v], axis=1)
PostID Reach anger haha like wow
0 1 787767 3 3200 49852 8017
1 2 973183 15 8017 57911 3013
On the other hand, if you have a column of dictionaries, then this should be faster -
v = pd.DataFrame.from_records(df.Reaction)
pd.concat([df.drop('Reaction', 1), v], axis=1)
PostID Reach anger haha like wow
0 1 787767 3 3200 49852 8017
1 2 973183 15 8017 57911 3013
Upvotes: 2