Reputation: 115
I'm trying to add a new column with the sum of the values of another column, but only for distinct rows. Say we have this dataframe:
col1 col2 vote
0 a 2 5
1 a 2 5
2 b 2 2
3 c 4 1
4 d 3 5
5 d 3 5
6 d 3 5
I want to group by col2, and then sum all the votes of distinct pairs of col1 and col 2 (the vote is always the same for each pair of col1 and col2.
I'm trying to obtain this result:
col1 col2 vote aggrVote
0 a 2 5 7
1 a 2 5 7
2 b 2 2 7
3 c 4 1 1
4 d 3 5 5
5 d 3 5 5
6 d 3 5 5
The value 3 for col2 has only one distinct col1's value, so the vote is just 5. the value 2 for col2 has 3 distinct col1's values (a,b) so the sum is 5 + 2
Upvotes: 1
Views: 4070
Reputation: 27869
This should do it all in one line:
df
# col1 col2 vote
#0 a 2 5
#1 a 2 5
#2 b 2 2
#3 c 4 1
#4 d 3 5
#5 d 3 5
#6 d 3 5
df['aggrVote'] = df.groupby(['col2'])['vote'].transform(lambda x: x.unique().sum())
df
# col1 col2 vote aggrVote
#0 a 2 5 7
#1 a 2 5 7
#2 b 2 2 7
#3 c 4 1 1
#4 d 3 5 5
#5 d 3 5 5
#6 d 3 5 5
Upvotes: 0
Reputation: 75080
IIUC, you need:
s = df.drop_duplicates(['col1','col2']).groupby('col2')['vote'].sum() #thanks @jez
df['aggrVote']=df.col2.map(s)
print(df)
col1 col2 vote aggrVote
0 a 2 5 7
1 a 2 5 7
2 b 2 2 7
3 c 4 1 1
4 d 3 5 5
5 d 3 5 5
6 d 3 5 5
Upvotes: 4