Pandas: aggregate based on filter on another column

Question

I have a dataframe that looks like this

Month   Fruit   Sales
1       Apple   45
1       Bananas 12
3       Apple   6
1       Kiwi    34
12      Melon   12

I'm trying to get a dataframe that goes like this

Fruit         Sales (month=1)     Sales (month=2)
Apple         55                  65
Bananas       12                  102
Kiwi          54                  78
Melon         132                 43

Right now I have

df=df.groupby(['Fruit']).agg({'Sales':np.sum}).reset_index()

There has to be some way to filter the arguments within agg() based on the "Month" variable. I just haven't been able to find it in the docs. Any help?

Edit: Thanks for the solutions. To complicate things, I would like to sum up another column as well. Example:

Month    Fruit    Sales  Revenue
1       Apple    45     45
1       Bananas  12     12
3       Apple    6      6
1       Kiwi     34     34
12      Melon    12     12

The preferred output would be similar to

            Sales      Revenue
     Fruit   1  3  12  1   3  12
0    Apple  61  6   0  61  6  0
1  Bananas  12  6   0  12  6  0
2     Kiwi  34  0   0  34  0  0
3    Melon   0  0  12  0   0  12

I managed to get this with df.pivot_table(values=['Sales','Revenue'], index='Fruit', columns=['Month'], aggfunc='np.sum').reset_index(), so my problem is resolved.

I attempted the same thing with df.groupby(['Fruit', 'Month'])['Sales','Revenue'].sum().unstack('Month', fill_value=0).rename_axis(None, 1).reset_index(), but this throws a TypeError. Can the above operation be done with groupby as well?

dotcs · Accepted Answer

To answer the updated question you should do things a little differently. First group by the elements that should be the columns afterwards (Month and Fruit). Then calculate the sum of those groups and unstack the DataFrame afterwards which leaves the Fruit column as the index column.

data = '''
Month    Fruit   Sales  Revenue
1       Apple    45     45
1       Bananas  12     12
1       Apple    16     16
3       Apple    6      6
1       Kiwi     34     34
3       Bananas  6      6
12      Melon    12     12
'''
df = pd.read_csv(StringIO(data), sep='\s+')

df.groupby(['Month', 'Fruit'])\
    .sum()\
    .unstack(level=0)

Result

        Sales            Revenue           
Month      1    3     12      1    3     12
Fruit                                      
Apple    61.0  6.0   NaN    61.0  6.0   NaN
Bananas  12.0  6.0   NaN    12.0  6.0   NaN
Kiwi     34.0  NaN   NaN    34.0  NaN   NaN
Melon     NaN  NaN  12.0     NaN  NaN  12.0

old answer

Use the pivot_table method:

import pandas as pd
from io import StringIO

data = '''\
Month Fruit  Sales
1       Apple   45
1       Bananas 12
1       Apple   16
3       Apple   6
1       Kiwi    34
3       Bananas 6
12      Melon   12
'''
df = pd.read_csv(StringIO(data), sep='\s+')

df.pivot_table('Sales', index='Fruit', columns=['Month'], aggfunc='sum')

Result:

Month      1    3     12
Fruit                   
Apple    61.0  6.0   NaN
Bananas  12.0  6.0   NaN
Kiwi     34.0  NaN   NaN
Melon     NaN  NaN  12.0

Pandas: aggregate based on filter on another column

Answers (2)

old answer

Related Questions