Reputation: 407
0 [{'review_id': 4873356, 'rating': '5.0'}, {'review_id': 4973356, 'rating': '4.0'}]
1 [{'review_id': 4635892, 'rating': '5.0'}, {'review_id': 4645839, 'rating': '3.0'}]
I have a situation where I want to flatten such json as solved here: Converting array of arrays into flattened dataframe
But I want to create new columns so that the output is:
review_id_1 rating_1 review_id_2 rating_2
4873356 5.0 4973356 4.0
4635892 5.0 4645839 3.0
Any help is highly appreciated..
Upvotes: 0
Views: 541
Reputation: 71570
Try using:
print(pd.DataFrame(s.apply(lambda x: {a: b for i in [{x + str(i): y for x, y in v.items()} for i, v in enumerate(x, 1)] for a, b in i.items()}).tolist()))
Output:
rating1 rating2 review_id1 review_id2
0 5.0 4.0 4873356 4973356
1 5.0 3.0 4635892 4645839
Upvotes: 1
Reputation: 109526
This type of data munging tends to be manual.
# Sample data.
df = pd.DataFrame({
'json_data': [
[{'review_id': 4873356, 'rating': '5.0'}, {'review_id': 4973356, 'rating': '4.0'}],
[{'review_id': 4635892, 'rating': '5.0'}, {'review_id': 4645839, 'rating': '3.0'}],
]
})
# Data transformation:
# Step 1: Temporary dataframe that splits data from `df` into two columns.
df2 = pd.DataFrame(zip(*df['json_data']))
# Step 2: Use a list comprehension to concatenate the records from each column so that the df now has 4 columns.
df2 = pd.concat([pd.DataFrame.from_records(df2[col]) for col in df2], axis=1)
# Step 3: Rename final columns
df2.columns = ['review_id_1', 'rating_1', 'review_id_2', 'rating_2']
>>> df2
review_id_1 rating_1 review_id_2 rating_2
0 4873356 5.0 4635892 5.0
1 4973356 4.0 4645839 3.0
Upvotes: 1