bluesummers
bluesummers

Reputation: 12657

Pandas groupby and average across unique values

I have the following dataframe

   ID ID2  SCORE  X  Y
0   0   a     10  1  2
1   0   b     20  2  3
2   0   b     20  3  4
3   0   b     30  4  5
4   1   c      5  5  6
5   1   d      6  6  7

What I would like to do, is to groupby ID and ID2 and to average the SCORE taking into consideration only UNIQUE scores.

Now, if I use the standard df.groupby(['ID', 'ID2'])['SCORE'].mean() I would get 23.33~, where what I am looking for is a score of 25.

I know I can filter out X and Y, drop the duplicates and do that, but I want to keep them as they are relevant.

How can I achieve that?

Upvotes: 6

Views: 9158

Answers (3)

piRSquared
piRSquared

Reputation: 294516

You can get the unique scores within groups of ('ID', 'ID2') by dropping duplicates before hand.

cols = ['ID', 'ID2', 'SCORE']
d1 = df.drop_duplicates(cols)
d1.groupby(cols[:-1]).SCORE.mean()

ID  ID2
0   a      10
    b      25
1   c       5
    d       6
Name: SCORE, dtype: int64

Upvotes: 2

Zero
Zero

Reputation: 77027

You could also use

In [108]: df.drop_duplicates(['ID', 'ID2', 'SCORE']).groupby(['ID', 'ID2'])['SCORE'].mean()
Out[108]:
ID  ID2
0   a      10
    b      25
1   c       5
    d       6
Name: SCORE, dtype: int64

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210972

If i understand correctly:

In [41]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().sum()/x.nunique())
Out[41]:
ID  ID2
0   a      10
    b      25
1   c       5
    d       6
Name: SCORE, dtype: int64

or bit easier:

In [43]: df.groupby(['ID', 'ID2'])['SCORE'].agg(lambda x: x.unique().mean())
Out[43]:
ID  ID2
0   a      10
    b      25
1   c       5
    d       6
Name: SCORE, dtype: int64

Upvotes: 11

Related Questions