How to sum values across groups without summing duplicates

Question

I have the following df:

     A    B        C       D 
0  foo    a     1200     300  
0  foo    a      700     300  
0  foo    b     1000     300         
1  bar    b      270      70 
1  bar    a      350      70
2  abc    c      270     300 
2  abc    a      350     300

I want to display the sum of values in column D grouped by column B, but I do not want to sum the values in column B for a single value in column A. That is, column D has only one value per value in column A.

foo will only ever have the value 300 and bar will only have the value 70 in column D. The values in this column are just repeated because I have repeated indexes.

I want to print something like (no need to show formatting, I just need to output the correct sums):

a: 300 (from foo) + 300 (from foo) + 70 (from bar) = 670
b: 300 (from foo) + 70 (from bar) = 370
c: 300 (from abc)

That is, values in column D should not be summed together if the value in column A is the same among them.

Rabinzel · Accepted Answer

You could use pd.unique() after the groupby and then sum those values up.

df.groupby('B')['D'].apply(lambda x: sum(pd.unique(x)))

B
a    370
b    370
Name: D, dtype: int64

UPDATE For your new example you search for something like this:

df.groupby(['B','A'])['D'].apply(lambda x: sum(pd.unique(x))).groupby('B').sum()

Output:

B
a    670
b    370
c    300
Name: D, dtype: int64

How to sum values across groups without summing duplicates

Answers (1)

Related Questions