Reputation: 6753
I have a dataframe like such:
Fruit Brand Supermarket Sold Revenue
Apple A X 9 2
Apple A Y 2 0.5
Apple B X 2 1
Orange A X 4 1.3
Orange B Y 0 0
Banana A X 3 0
Apple B Z 0 0
.......
What I want to do is to summarize these columns relative to the fruit type and Brand. i.e. a row for Apples of Brand A, Apples of Brand B and so on.
But there is a twist, I also want the 'Sold' and 'Revenue' columns to be divided into three columns for each supermarket(X,Y,Z) such as:
Fruit Brand Supermarket Sold_x Sold_y Sold_z Revenue_x Revenue_y Revenue_z
Apple A X 9 0 0 2 0 0
Apple A Y 0 2 0 0 0.5 0
Whats the most efficient way to do this in Pandas?
Upvotes: 0
Views: 120
Reputation: 294258
df.set_index(['Fruit', 'Brand', 'Supermarket']).unstack(fill_value=0.)
Upvotes: 1
Reputation: 16241
This looks like the perfect job for pivot_table
:
df.pivot_table(index=['Fruit', 'Brand'], columns='Supermarket').fillna(0)
Out[8]:
Sold Revenue
Supermarket X Y Z X Y Z
Fruit Brand
Apple A 9.0 2.0 0.0 2.0 0.5 0.0
B 2.0 0.0 0.0 1.0 0.0 0.0
Banana A 3.0 0.0 0.0 0.0 0.0 0.0
Orange A 4.0 0.0 0.0 1.3 0.0 0.0
B 0.0 0.0 0.0 0.0 0.0 0.0
All you need now is to rename the columns.
Upvotes: 1
Reputation: 7476
What about this solution? Essentially I do two subsequent groupby, first to group on Fruit and Brand, second to group by supermarket.
cols = ['Fruit', 'Brand', 'Supermarket', 'Sold', 'Revenue']
df = pd.DataFrame([['Apple', 'A', 'X', 9, 2],
['Apple', 'A', 'Y', 2, 0.5],
['Apple', 'B', 'X', 2, 1],
['Orange', 'A', 'X', 4, 1.3],
['Orange', 'B', 'Y', 0, 0],
['Banana', 'A', 'X', 3, 0],
['Apple', 'B', 'Z', 0, 0]], columns = cols)
out = {}
for key, group in df.groupby(['Fruit', 'Brand']):
out[key] = {}
for subkey, subgroup in group.groupby('Supermarket'):
out[key][subkey] = subgroup[['Supermarket', 'Sold', 'Revenue']].add_suffix("_"+subkey.lower())
out[key] = pd.concat(out[key], axis = 1)
out = pd.concat(out, axis = 0)
out.columns = out.columns.droplevel(0)
out = out.reset_index(level = 2, drop = True)\
.fillna(0)\
.reset_index().rename(columns = {'level_0' : 'Fruit', 'level_1' : 'Brand'})\
.loc[:, ['Fruit', 'Brand', 'Sold_x', 'Sold_y', 'Sold_z', 'Revenue_x', 'Revenue_y', 'Revenue_z']]
The final result is the following:
out
Out[60]:
Fruit Brand Sold_x Sold_y Sold_z Revenue_x Revenue_y Revenue_z
0 Apple A 9.0 0.0 0.0 2.0 0.0 0.0
1 Apple A 0.0 2.0 0.0 0.0 0.5 0.0
2 Apple B 2.0 0.0 0.0 1.0 0.0 0.0
3 Apple B 0.0 0.0 0.0 0.0 0.0 0.0
4 Banana A 3.0 0.0 0.0 0.0 0.0 0.0
5 Orange A 4.0 0.0 0.0 1.3 0.0 0.0
6 Orange B 0.0 0.0 0.0 0.0 0.0 0.0
Upvotes: 1