alex_T
alex_T

Reputation: 115

How to sum distinct rows in a pandas Dataframe

I'm trying to add a new column with the sum of the values of another column, but only for distinct rows. Say we have this dataframe:

    col1     col2   vote
0     a        2     5
1     a        2     5
2     b        2     2
3     c        4     1
4     d        3     5
5     d        3     5
6     d        3     5

I want to group by col2, and then sum all the votes of distinct pairs of col1 and col 2 (the vote is always the same for each pair of col1 and col2.

I'm trying to obtain this result:

    col1     col2   vote   aggrVote
0     a        2     5        7
1     a        2     5        7
2     b        2     2        7
3     c        4     1        1
4     d        3     5        5
5     d        3     5        5
6     d        3     5        5

The value 3 for col2 has only one distinct col1's value, so the vote is just 5. the value 2 for col2 has 3 distinct col1's values (a,b) so the sum is 5 + 2

Upvotes: 1

Views: 4070

Answers (2)

zipa
zipa

Reputation: 27869

This should do it all in one line:

df
#  col1  col2  vote
#0    a     2     5
#1    a     2     5
#2    b     2     2
#3    c     4     1
#4    d     3     5
#5    d     3     5
#6    d     3     5

df['aggrVote'] = df.groupby(['col2'])['vote'].transform(lambda x: x.unique().sum())

df
#  col1  col2  vote  aggrVote
#0    a     2     5         7
#1    a     2     5         7
#2    b     2     2         7
#3    c     4     1         1
#4    d     3     5         5
#5    d     3     5         5
#6    d     3     5         5

Upvotes: 0

anky
anky

Reputation: 75080

IIUC, you need:

s = df.drop_duplicates(['col1','col2']).groupby('col2')['vote'].sum() #thanks @jez
df['aggrVote']=df.col2.map(s)
print(df)

  col1  col2  vote  aggrVote
0    a     2     5         7
1    a     2     5         7
2    b     2     2         7
3    c     4     1         1
4    d     3     5         5
5    d     3     5         5
6    d     3     5         5

Upvotes: 4

Related Questions