Reputation: 103
Suggest I have data of the format in one field of (some nested) JSON:
Name Identifier Data
Joe 54872 [{"ref":{"type":4,"id":86669},"side":"Buy","ratio":1},{"ref":{"type":4,"id":80843},"side":"Sell","ratio":1}]
Jill 84756 [{"ref":{"type":4,"id":75236},"side":"Buy","ratio":1},{"ref":{"type":4,"id":75565},"side":"Sell","ratio":1}]
Is there a simplistic way, rather than unpacking the json into it's own dataframe then concatenating it with the fixed data for each row of len(n) where n is the length of each json dataframe, to produce the following data?
Name Identifier ref_type ref_id side ratio
Joe 54872 4 86669 buy 1
Joe 54872 4 80843 sell 1
Jill 84756 4 75236 buy 1
Jill 84756 4 75565 sell 1
Thanks.
Upvotes: 2
Views: 1554
Reputation: 862751
I think the best is use json_normalize
:
from pandas.io.json import json_normalize
import json
with open('file.json') as data_file:
data = json.load(data_file)
df = json_normalize(data)
EDIT:
If not possible use:
import ast
from pandas.io.json import json_normalize
#convert strings to lists and dicts
df['Data'] = df['Data'].apply(ast.literal_eval)
#parse Data column
df1 = pd.concat([json_normalize(x) for x in df['Data'].values.tolist()], keys= df.index)
#append to original
df1 = df.drop('Data', 1).join(df1.reset_index(level=1, drop=True)).reset_index(drop=True)
print (df1)
Name Identifier ratio ref.id ref.type side
0 Joe 54872 1 86669 4 Buy
1 Joe 54872 1 80843 4 Sell
2 Jill 84756 1 75236 4 Buy
3 Jill 84756 1 75565 4 Sell
Upvotes: 2