acpigeon
acpigeon

Reputation: 1719

Pandas group by and sum two columns

Beginner question. This seems like it should be a straightforward operation, but I can't figure it out from reading the docs.

I have a df with this structure:

|integer_id|int_field_1|int_field_2|

The integer_id column is non-unique, so I'd like to group the df by integer_id and sum the two fields.

The equivalent SQL is:

SELECT integer_id, SUM(int_field_1), SUM(int_field_2) FROM tbl
GROUP BY integer_id

Any suggestions on the simplest way to do this?

EDIT: Including input/output

Input:  
integer_id  int_field_1 int_field_2   
2656        36          36  
2656        36          36  
9702        2           2  
9702        1           1  

Ouput using df.groupby('integer_id').sum():

integer_id  int_field_1 int_field_2  
2656        72          72  
9702        3           3  

Upvotes: 7

Views: 19331

Answers (3)

xxyjoel
xxyjoel

Reputation: 591

A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.

df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})

using your values...

df.groupby(['integer_id']).agg({'int_field_1': "sum", 'int_field_2': "sum" })

Upvotes: 0

Bastin Robin
Bastin Robin

Reputation: 938

You can do it

data.groupby(by=['account_ID'])['purchases'].sum()

Upvotes: 1

EdChum
EdChum

Reputation: 393873

You just need to call sum on a groupby object:

df.groupby('integer_id').sum()

See the docs for further examples

Upvotes: 11

Related Questions