Reputation: 12657
I have the following dataframe
ID ID2 SCORE X Y
0 0 a 10 1 2
1 0 b 20 2 3
2 0 b 20 3 4
3 0 b 30 4 5
4 1 c 5 5 6
5 1 d 6 6 7
What I would like to do, is to groupby ID
and ID2
and to average the SCORE
taking into consideration only UNIQUE scores.
Now, if I use the standard df.groupby(['ID', 'ID2'])['SCORE'].mean()
I would get 23.33~, where what I am looking for is a score of 25.
I know I can filter out X
and Y
, drop the duplicates and do that, but I want to keep them as they are relevant.
How can I achieve that?
Upvotes: 6
Views: 9158
Reputation: 294516
You can get the unique scores within groups of ('ID', 'ID2')
by dropping duplicates before hand.
cols = ['ID', 'ID2', 'SCORE']
d1 = df.drop_duplicates(cols)
d1.groupby(cols[:-1]).SCORE.mean()
ID ID2
0 a 10
b 25
1 c 5
d 6
Name: SCORE, dtype: int64
Upvotes: 2
Reputation: 77027
You could also use
In [108]: df.drop_duplicates(['ID', 'ID2', 'SCORE']).groupby(['ID', 'ID2'])['SCORE'].mean()
Out[108]:
ID ID2
0 a 10
b 25
1 c 5
d 6
Name: SCORE, dtype: int64
Upvotes: 1
Reputation: 210972
If i understand correctly:
In [41]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().sum()/x.nunique())
Out[41]:
ID ID2
0 a 10
b 25
1 c 5
d 6
Name: SCORE, dtype: int64
or bit easier:
In [43]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().mean())
Out[43]:
ID ID2
0 a 10
b 25
1 c 5
d 6
Name: SCORE, dtype: int64
Upvotes: 11