How to sum distinct rows in a pandas Dataframe

Question

I'm trying to add a new column with the sum of the values of another column, but only for distinct rows. Say we have this dataframe:

    col1     col2   vote
0     a        2     5
1     a        2     5
2     b        2     2
3     c        4     1
4     d        3     5
5     d        3     5
6     d        3     5

I want to group by col2, and then sum all the votes of distinct pairs of col1 and col 2 (the vote is always the same for each pair of col1 and col2.

I'm trying to obtain this result:

    col1     col2   vote   aggrVote
0     a        2     5        7
1     a        2     5        7
2     b        2     2        7
3     c        4     1        1
4     d        3     5        5
5     d        3     5        5
6     d        3     5        5

The value 3 for col2 has only one distinct col1's value, so the vote is just 5. the value 2 for col2 has 3 distinct col1's values (a,b) so the sum is 5 + 2

anky · Accepted Answer

IIUC, you need:

s = df.drop_duplicates(['col1','col2']).groupby('col2')['vote'].sum() #thanks @jez
df['aggrVote']=df.col2.map(s)
print(df)

  col1  col2  vote  aggrVote
0    a     2     5         7
1    a     2     5         7
2    b     2     2         7
3    c     4     1         1
4    d     3     5         5
5    d     3     5         5
6    d     3     5         5

How to sum distinct rows in a pandas Dataframe

Answers (2)

Related Questions