Vega
Vega

Reputation: 2929

Get sum per month from daily data into new column while keeping daily data

I have a df with daily data and some levels per day:

date       | value1 | value2 | level
2020-01-01 | 1      | 2      | "a"   
2020-01-01 | 3      | 10     | "b"   
2020-01-01 | 2      | 3      | "c"   
2020-01-02 | 1      | 2      | "a"   
2020-01-02 | 3      | 10     | "b"  
2020-01-02 | 2      | 3      | "c"  
...        | ...    | ...    | ...   
2021-02-01 | 10     | 1      | "a"   
2021-02-01 | 8      | 4      | "b"   
2021-02-01 | 1      | 5      | "c"  
2021-02-03 | 10     | 1      | "a" 
2021-02-03 | 8      | 4      | "b"   
2021-02-03 | 1      | 5      | "c"   

I need the sum per month of value1 and value2 in a new column while keeping the daily rows, like:

date       | value1 | value2 | level | value1_permonth | value2_permonth
2020-01-01 | 1      | 2      | "a"   | 12               | 30
2020-01-01 | 3      | 10     | "b"   | 12               | 30
2020-01-01 | 2      | 3      | "c"   | 12               | 30
2020-01-02 | 1      | 2      | "a"   | 12               | 30
2020-01-02 | 3      | 10     | "b"   | 12               | 30
2020-01-02 | 2      | 3      | "c"   | 12               | 30
...        | ...    | ...    | ...   | ...              | ...
2021-02-01 | 10     | 1      | "a"   | 38               | 20
2021-02-01 | 8      | 4      | "b"   | 38               | 20
2021-02-01 | 1      | 5      | "c"   | 38               | 20
2021-02-03 | 10     | 1      | "a"   | 38               | 20
2021-02-03 | 8      | 4      | "b"   | 38               | 20
2021-02-03 | 1      | 5      | "c"   | 38               | 20

How can I do this with pandas?

Upvotes: 1

Views: 224

Answers (1)

jezrael
jezrael

Reputation: 862581

Use Grouper with GroupBy.transform for new columns filled by aggregate values:

cols = ['value1','value2']
df1 = df.groupby(pd.Grouper(freq='MS', key='date'))[cols].transform('sum')

Or DataFrame.resample:

cols = ['value1','value2']
df1 = df.resample('MS', on='date')[cols].transform('sum')

Or use monthle periods by Series.dt.to_period passed to groupby:

cols = ['value1','value2']
df1 = df.groupby(df['date'].dt.to_period('m'))[cols].transform('sum')
print (df1)

df2 = df.join(df1.add_suffix('_permonth'))

print (df2)
         date  value1  value2 level  value1_permonth  value2_permonth
0  2020-01-01       1       2     a               12               30
1  2020-01-01       3      10     b               12               30
2  2020-01-01       2       3     c               12               30
3  2020-01-02       1       2     a               12               30
4  2020-01-02       3      10     b               12               30
5  2020-01-02       2       3     c               12               30
6  2021-02-01      10       1     a               38               20
7  2021-02-01       8       4     b               38               20
8  2021-02-01       1       5     c               38               20
9  2021-02-03      10       1     a               38               20
10 2021-02-03       8       4     b               38               20
11 2021-02-03       1       5     c               38               20

Upvotes: 1

Related Questions