JAG2024
JAG2024

Reputation: 4317

Write python function that sums values from certain rows for each index type using groupby

In my dataframe, df, I am trying to sum the values from the value column for each Product and Year for two periods of the year (Month), specifically Months 1 through 3 and Months 9 through 11. I know I need to use groupby to group Products and Years, and possibly use a lambda function (or an if statement) to separate the two periods of time.

Here's my data frame df:

import pandas as pd
products = {'Product': ['A','A','A','A','A','A','B','B','B','B','C','C','C','C','C',
                       'C','C','C'],
            'Month': [1,1,3,4,5,10,4,5,10,11,2,3,5,3,9,
                       10,11,12],
            'Year': [1999,1999,1999,1999,1999,1999,2017,2017,1988,1988,2002,2002,2002,2003,2003,
                       2003,2003,2003],
            'value': [250,810,1200,340,250,800,1200,400,250,800,1200,300,290,800,1200,300, 1200, 300]
            }

df = pd.DataFrame(products, columns= ['Product', 'Month','Year','value'])
df

And I want a table that looks something like this:

products = {'Product': ['A','A','B','B','C','C','C'],
            'MonthGroups': ['Month1:3','Month9:11','Month1:3','Month9:11','Month1:3','Month1:3','Month9:11'],
            'Year': [1999,1999,2017,1988,2002, 2003, 2003],
            'SummedValue': [2260, 800, 0, 1050, 1500, 800, 2700]
            }

new_df = pd.DataFrame(products, columns= ['Product', 'MonthGroups','Year','SummedValue'])
new_df

What I have so far that is that I should use groupby to group Product and Year. What I'm stuck on is defining the two "Month Groups": Months 1 through 3 and Months 9 through 11, which should be the sum of value per year.

df.groupby(['Product','Year']).value.sum().loc[lambda p: p > 10].to_frame()

This isn't right though because it needs to sum based on the month groups.

Upvotes: 1

Views: 61

Answers (2)

anky
anky

Reputation: 75100

A little different approach using pd.cut:

bins = [0,3,8,11]
s = pd.cut(df['Month'],bins,labels=['1:3','irrelevant','9:11'])

(df[s.isin(['1:3','9:11'])].assign(MonthGroups=s.astype(str))
.groupby(['Product','MonthGroups','Year'])['value'].sum().reset_index())

  Product MonthGroups  Year  value
0       A         1:3  1999   2260
1       A        9:11  1999    800
2       B        9:11  1988   1050
3       C         1:3  2002   1500
4       C         1:3  2003    800
5       C        9:11  2003   2700

Upvotes: 2

jezrael
jezrael

Reputation: 863236

First created new column by numpy.select with DataFrame.assign, then aggregate also by MonthGroups and because groupby by default remove rows with misisng values if column used for by parameter (like here MonthGroups) are omitted not matched groups:

df1 = (df.assign(MonthGroups = np.select([df['Month'].between(1,3),
                                         df['Month'].between(9,11)],
                                         ['Month1:3','Month9:11'], default=None))
         .groupby(['Product','MonthGroups','Year']).value
         .sum()
         .reset_index(name='SummedValue')
         )
print (df1)
  Product MonthGroups  Year  SummedValue
0       A    Month1:3  1999         2260
1       A   Month9:11  1999          800
2       B   Month9:11  1988         1050
3       C    Month1:3  2002         1500
4       C    Month1:3  2003          800
5       C   Month9:11  2003         2700

If need also 0 sum values for not matched rows:

df2 = df[['Product','Year']].drop_duplicates().assign(MonthGroups='Month1:3',SummedValue=0)

df1 = (df.assign(MonthGroups = np.select([df['Month'].between(1,3),
                                         df['Month'].between(9,11)], 
                                         ['Month1:3','Month9:11'], default=None))
         .groupby(['Product','MonthGroups','Year']).value
         .sum()
         .reset_index(name='SummedValue')
         .append(df2)
         .drop_duplicates(['Product','MonthGroups','Year'])
         )
print (df1)
  Product MonthGroups  Year  SummedValue
0       A    Month1:3  1999         2260
1       A   Month9:11  1999          800
2       B   Month9:11  1988         1050
3       C    Month1:3  2002         1500
4       C    Month1:3  2003          800
5       C   Month9:11  2003         2700
6       B    Month1:3  2017            0
8       B    Month1:3  1988            0

Upvotes: 3

Related Questions