Reputation: 1561
I have my Dataframe in the below structure. I would like to break them based on the nested values within the details column
cust_id, name, details
101, Kevin, [{"id":1001,"country":"US","state":"OH"}, {"id":1002,"country":"US","state":"GA"}]
102, Scott, [{"id":2001,"country":"US","state":"OH"}, {"id":2002,"country":"US","state":"GA"}]
Expected output
cust_id, name, id, country, state
101, Kevin, 1001, US, OH
101, Kevin, 1002, US, GA
102, Scott, 2001, US, OH
102, Scott, 2002, US, GA
Upvotes: 4
Views: 2195
Reputation:
df = df.explode('details').reset_index(drop=True)
df = df.merge(pd.json_normalize(df['details']), left_index=True, right_index=True).drop('details', axis=1)
df.explode("details")
basically duplicates each row in the details
N times, where N is the number of items in the array (if any) of details
of that rowexplode
duplicates the rows, the original rows' indices (0 and 1) are copied to the new rows, so their indices are 0, 0, 1, 1, which messes up later processing. reset_index()
creates a fresh new column for the index, starting at 0
. drop=True
is used because by default pandas will keep the old index column; this removes it.pd.json_normalize(df['details'])
converts the column (where each row contains a JSON object) to a new dataframe where each key unique of all the JSON objects is new columndf.merge()
merges the new dataframe into the original oneleft_index=True
and right_index=True
tells pandas to merge the specified dataframe starting from it's first, row into this dataframe, starting at its first row.drop('details', axis=1)
gets rid of the old details
column containing the old objectsUpvotes: 8