babsdoc
babsdoc

Reputation: 749

Python Pandas group by month and year

I have the following,

import pandas as pd

data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 20],
        ['AAA','2019-02-01', 30], ['AAA','2019-02-02', 40],
        ['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
        ['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]

dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])

  NAME   TIMESTAMP  VALUE
0  AAA  2019-01-01     10
1  AAA  2019-01-02     20
2  AAA  2019-02-01     30
3  AAA  2019-02-02     40
4  BBB  2019-01-01     50
5  BBB  2019-01-02     60
6  BBB  2019-02-01     70
7  BBB  2019-02-02     80

I am trying to do a sum on the column 'VALUE' grouped by MONTH and YEAR in the 'TIMESTAMP' and 'NAME' columns.

So the final desired output is,

  NAME   TIMESTAMP  VALUE SUM
0  AAA  2019-01-01     10  30
1  AAA  2019-01-02     20  30
2  AAA  2019-02-01     30  70
3  AAA  2019-02-02     40  70
4  BBB  2019-01-01     50 110
5  BBB  2019-01-02     60 110
6  BBB  2019-02-01     70 150
7  BBB  2019-02-02     80 150

How do I get this output?

Thanks.

Upvotes: 2

Views: 255

Answers (1)

jezrael
jezrael

Reputation: 862481

Use GroupBy.transform with Series.dt.year and Series.dt.month:

d = pd.to_datetime(dfx['TIMESTAMP'])
dfx['SUM'] = (dfx.groupby(['NAME', 
                           dfx['TIMESTAMP'].dt.year, 
                           dfx['TIMESTAMP'].dt.month])['VALUE']
                 .transform('sum'))

Or month period by Series.dt.to_period:

dfx['SUM'] = (dfx.groupby(['NAME', dfx['TIMESTAMP'].dt.to_period('m')])['VALUE']
                 .transform('sum'))

print (dfx)
  NAME   TIMESTAMP  VALUE  SUM
0  AAA  2019-01-01     10   30
1  AAA  2019-01-02     20   30
2  AAA  2019-02-01     30   70
3  AAA  2019-02-02     40   70
4  BBB  2019-01-01     50  110
5  BBB  2019-01-02     60  110
6  BBB  2019-02-01     70  150
7  BBB  2019-02-02     80  150

Upvotes: 3

Related Questions