Reputation: 749
I have the following,
import pandas as pd
data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 20],
['AAA','2019-02-01', 30], ['AAA','2019-02-02', 40],
['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]
dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])
NAME TIMESTAMP VALUE
0 AAA 2019-01-01 10
1 AAA 2019-01-02 20
2 AAA 2019-02-01 30
3 AAA 2019-02-02 40
4 BBB 2019-01-01 50
5 BBB 2019-01-02 60
6 BBB 2019-02-01 70
7 BBB 2019-02-02 80
I am trying to do a sum on the column 'VALUE' grouped by MONTH and YEAR in the 'TIMESTAMP' and 'NAME' columns.
So the final desired output is,
NAME TIMESTAMP VALUE SUM
0 AAA 2019-01-01 10 30
1 AAA 2019-01-02 20 30
2 AAA 2019-02-01 30 70
3 AAA 2019-02-02 40 70
4 BBB 2019-01-01 50 110
5 BBB 2019-01-02 60 110
6 BBB 2019-02-01 70 150
7 BBB 2019-02-02 80 150
How do I get this output?
Thanks.
Upvotes: 2
Views: 255
Reputation: 862481
Use GroupBy.transform
with Series.dt.year
and
Series.dt.month
:
d = pd.to_datetime(dfx['TIMESTAMP'])
dfx['SUM'] = (dfx.groupby(['NAME',
dfx['TIMESTAMP'].dt.year,
dfx['TIMESTAMP'].dt.month])['VALUE']
.transform('sum'))
Or month period by Series.dt.to_period
:
dfx['SUM'] = (dfx.groupby(['NAME', dfx['TIMESTAMP'].dt.to_period('m')])['VALUE']
.transform('sum'))
print (dfx)
NAME TIMESTAMP VALUE SUM
0 AAA 2019-01-01 10 30
1 AAA 2019-01-02 20 30
2 AAA 2019-02-01 30 70
3 AAA 2019-02-02 40 70
4 BBB 2019-01-01 50 110
5 BBB 2019-01-02 60 110
6 BBB 2019-02-01 70 150
7 BBB 2019-02-02 80 150
Upvotes: 3