Reputation: 247
I have a table that looks like this.
Date Hospital Total Operations Errors
2001-10-01 Hospital A 1101 0
2001-10-01 Hospital B 32 0
2001-10-02 Hospital A 1184 0
2001-10-02 Hospital B 74 0
2001-10-03 Hospital A 1350 0
2001-10-03 Hospital B 72 0
2001-10-04 Hospital A 1364 0
2001-10-04 Hospital B 232 0
2001-10-05 Hospital A 1345 0
2001-10-05 Hospital B 212 0
And want to accrue totals for each hospital every day and add them to that specific day's row so that my final table can look like this.
Date Hospital Total Operations Errors Aggregated Operations Aggregated Errors
2001-10-01 Hospital A 1101 0 1101 0
2001-10-01 Hospital B 32 0 32 0
2001-10-02 Hospital A 1184 0 2285 0
2001-10-02 Hospital B 74 0 106 0
2001-10-03 Hospital A 1350 0 3635 0
2001-10-03 Hospital B 72 0 178 0
2001-10-04 Hospital A 1364 0 4999 0
2001-10-04 Hospital B 232 0 410 0
2001-10-05 Hospital A 1345 0 6344 0
2001-10-05 Hospital B 212 0 622 0
Is there a way to do that in python? Any assistance is truly appreciated.
Upvotes: 1
Views: 67
Reputation: 31236
Simple use of transform()
and concat results to original df
df = pd.concat([df, df.groupby(['Hospital']).transform("cumsum").add_suffix(" Cumulative")], axis=1)
Upvotes: 1
Reputation: 18377
You can use groupby()
and cumsum()
as I suggested previously in the comments:
df = pd.DataFrame({'Date':['2001-10-01','2001-10-01','2001-10-02','2001-10-02','2001-10-03','2001-10-03','2001-10-04','2001-10-04'],
'Hospital':['Hospital A','Hospital B','Hospital A','Hospital B','Hospital A','Hospital B','Hospital A','Hospital B'],
'Total Operations':[1101,32,1184,74,1350,72,1364,232],
'Errors':[0,0,0,0,0,0,0,0]})
df['Aggreated Operations'] = df.groupby(['Hospital'])['Total Operations'].cumsum()
df['Aggreated Erros'] = df.groupby(['Hospital'])['Errors'].cumsum()
print(df)
This will output:
Date Hospital ... Aggreated Operations Aggreated Erros
0 2001-10-01 Hospital A ... 1101 0
1 2001-10-01 Hospital B ... 32 0
2 2001-10-02 Hospital A ... 2285 0
3 2001-10-02 Hospital B ... 106 0
4 2001-10-03 Hospital A ... 3635 0
5 2001-10-03 Hospital B ... 178 0
6 2001-10-04 Hospital A ... 4999 0
7 2001-10-04 Hospital B ... 410 0
Upvotes: 3
Reputation: 16172
df.join(df.sort_values(by=['Date','Hospital']).groupby(['Hospital']).cumsum(), rsuffix=' Aggregated')
Output
Date Hospital Total Operations Errors Total Operations Aggregated Errors Aggregated
0 2001-10-01 Hospital A 1101 0 1101 0
1 2001-10-01 Hospital B 32 0 32 0
2 2001-10-02 Hospital A 1184 0 2285 0
3 2001-10-02 Hospital B 74 0 106 0
4 2001-10-03 Hospital A 1350 0 3635 0
5 2001-10-03 Hospital B 72 0 178 0
6 2001-10-04 Hospital A 1364 0 4999 0
7 2001-10-04 Hospital B 232 0 410 0
8 2001-10-05 Hospital A 1345 0 6344 0
9 2001-10-05 Hospital B 212 0 622 0
Upvotes: 1
Reputation: 8778
This code should work.
df['Aggregated Operations'] = df.groupby('Hospital')['Total Operations'].cumsum()
Upvotes: 1