Reputation: 1973
I have a sample dataframe as
sample_df = pd.DataFrame({'id': [1, 2], 'fruits' :[
[{'name': u'mango', 'cost': 100, 'color': u'yellow', 'size': 12}],
[{'name': u'mango', 'cost': 150, 'color': u'yellow', 'size': 21},
{'name': u'banana', 'cost': 200, 'color': u'green', 'size': 10} ]
]})
I would like to flatten the fruits
column to get new columns like name, cost, color
and size
. One id
can have more than 1 fruit entry. For example id 2
has information for 2 fruits mango
and banana
print(sample_df)
fruits id
0 [{'name': 'mango', 'cost': 100, 'color': 'yell... 1
1 [{'name': 'mango', 'cost': 150, 'color': 'yell... 2
In the output I would like to have 3 records, 1 record with fruit information for id 1
and 2 records for fruit information for id 2
Is there a way to parse this structure using pandas ?
Upvotes: 2
Views: 135
Reputation: 323396
First unnesting
your columns , then concat
the values after called DataFrame
s=unnesting(sample_df,['fruits']).reset_index(drop=True)
df=pd.concat([s.drop('fruits',1),pd.DataFrame(s.fruits.tolist())],axis=1)
df
Out[149]:
id color cost name size
0 1 yellow 100 mango 12
1 2 yellow 150 mango 21
2 2 green 200 banana 10
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
Method 2
sample_df.set_index('id').fruits.apply(pd.Series).stack().apply(pd.Series).reset_index(level=0)
Out[159]:
id color cost name size
0 1 yellow 100 mango 12
0 2 yellow 150 mango 21
1 2 green 200 banana 10
Upvotes: 2