Reputation: 333
I have the following df
:
A B C D
0 foo a 1200 300
0 foo a 700 300
0 foo b 1000 300
1 bar b 270 70
1 bar a 350 70
2 abc c 270 300
2 abc a 350 300
I want to display the sum of values in column D
grouped by column B
, but I do not want to sum the values in column B
for a single value in column A
. That is, column D
has only one value per value in column A
.
foo
will only ever have the value 300
and bar
will only have the value 70
in column D
. The values in this column are just repeated because I have repeated indexes.
I want to print something like (no need to show formatting, I just need to output the correct sums):
a: 300 (from foo) + 300 (from foo) + 70 (from bar) = 670
b: 300 (from foo) + 70 (from bar) = 370
c: 300 (from abc)
That is, values in column D
should not be summed together if the value in column A
is the same among them.
Upvotes: 0
Views: 386
Reputation: 7923
You could use pd.unique()
after the groupby and then sum those values up.
df.groupby('B')['D'].apply(lambda x: sum(pd.unique(x)))
B
a 370
b 370
Name: D, dtype: int64
UPDATE For your new example you search for something like this:
df.groupby(['B','A'])['D'].apply(lambda x: sum(pd.unique(x))).groupby('B').sum()
Output:
B
a 670
b 370
c 300
Name: D, dtype: int64
Upvotes: 3