Reputation: 1407
I have a dataframe that looks like this
Month Fruit Sales
1 Apple 45
1 Bananas 12
3 Apple 6
1 Kiwi 34
12 Melon 12
I'm trying to get a dataframe that goes like this
Fruit Sales (month=1) Sales (month=2)
Apple 55 65
Bananas 12 102
Kiwi 54 78
Melon 132 43
Right now I have
df=df.groupby(['Fruit']).agg({'Sales':np.sum}).reset_index()
There has to be some way to filter the arguments within agg() based on the "Month" variable. I just haven't been able to find it in the docs. Any help?
Edit: Thanks for the solutions. To complicate things, I would like to sum up another column as well. Example:
Month Fruit Sales Revenue
1 Apple 45 45
1 Bananas 12 12
3 Apple 6 6
1 Kiwi 34 34
12 Melon 12 12
The preferred output would be similar to
Sales Revenue
Fruit 1 3 12 1 3 12
0 Apple 61 6 0 61 6 0
1 Bananas 12 6 0 12 6 0
2 Kiwi 34 0 0 34 0 0
3 Melon 0 0 12 0 0 12
I managed to get this with df.pivot_table(values=['Sales','Revenue'], index='Fruit', columns=['Month'], aggfunc='np.sum').reset_index()
, so my problem is resolved.
I attempted the same thing with df.groupby(['Fruit', 'Month'])['Sales','Revenue'].sum().unstack('Month', fill_value=0).rename_axis(None, 1).reset_index()
, but this throws a TypeError. Can the above operation be done with groupby
as well?
Upvotes: 7
Views: 3206
Reputation: 2296
To answer the updated question you should do things a little differently. First group by the elements that should be the columns afterwards (Month and Fruit). Then calculate the sum of those groups and unstack the DataFrame afterwards which leaves the Fruit column as the index column.
data = '''
Month Fruit Sales Revenue
1 Apple 45 45
1 Bananas 12 12
1 Apple 16 16
3 Apple 6 6
1 Kiwi 34 34
3 Bananas 6 6
12 Melon 12 12
'''
df = pd.read_csv(StringIO(data), sep='\s+')
df.groupby(['Month', 'Fruit'])\
.sum()\
.unstack(level=0)
Result
Sales Revenue
Month 1 3 12 1 3 12
Fruit
Apple 61.0 6.0 NaN 61.0 6.0 NaN
Bananas 12.0 6.0 NaN 12.0 6.0 NaN
Kiwi 34.0 NaN NaN 34.0 NaN NaN
Melon NaN NaN 12.0 NaN NaN 12.0
Use the pivot_table
method:
import pandas as pd
from io import StringIO
data = '''\
Month Fruit Sales
1 Apple 45
1 Bananas 12
1 Apple 16
3 Apple 6
1 Kiwi 34
3 Bananas 6
12 Melon 12
'''
df = pd.read_csv(StringIO(data), sep='\s+')
df.pivot_table('Sales', index='Fruit', columns=['Month'], aggfunc='sum')
Result:
Month 1 3 12
Fruit
Apple 61.0 6.0 NaN
Bananas 12.0 6.0 NaN
Kiwi 34.0 NaN NaN
Melon NaN NaN 12.0
Upvotes: 4
Reputation: 210912
UPDATE:
In [177]: df
Out[177]:
Month Fruit Sales Revenue
0 1 Apple 45 45
1 1 Bananas 12 12
2 3 Apple 6 6
3 1 Kiwi 34 34
4 12 Melon 12 12
In [178]: df.groupby(['Fruit', 'Month'])[['Sales','Revenue']].sum().unstack('Month', fill_value=0)
Out[178]:
Sales Revenue
Month 1 3 12 1 3 12
Fruit
Apple 45 6 0 45 6 0
Bananas 12 0 0 12 0 0
Kiwi 34 0 0 34 0 0
Melon 0 0 12 0 0 12
OLD answer:
Alternatively youcan use groupby()
+ unstack()
:
In [206]: df.groupby(['Fruit', 'Month'])['Sales'].sum().unstack('Month', fill_value=0) \
...: .rename_axis(None, 1).reset_index()
...:
Out[206]:
Fruit 1 3 12
0 Apple 61 6 0
1 Bananas 12 6 0
2 Kiwi 34 0 0
3 Melon 0 0 12
Upvotes: 0