Reputation: 1719
Beginner question. This seems like it should be a straightforward operation, but I can't figure it out from reading the docs.
I have a df with this structure:
|integer_id|int_field_1|int_field_2|
The integer_id column is non-unique, so I'd like to group the df by integer_id and sum the two fields.
The equivalent SQL is:
SELECT integer_id, SUM(int_field_1), SUM(int_field_2) FROM tbl
GROUP BY integer_id
Any suggestions on the simplest way to do this?
EDIT: Including input/output
Input:
integer_id int_field_1 int_field_2
2656 36 36
2656 36 36
9702 2 2
9702 1 1
Ouput using df.groupby('integer_id').sum():
integer_id int_field_1 int_field_2
2656 72 72
9702 3 3
Upvotes: 7
Views: 19331
Reputation: 591
A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.
df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})
using your values...
df.groupby(['integer_id']).agg({'int_field_1': "sum", 'int_field_2': "sum" })
Upvotes: 0
Reputation: 938
You can do it
data.groupby(by=['account_ID'])['purchases'].sum()
Upvotes: 1