abhjt
abhjt

Reputation: 442

Calculate mode of a column in Pandas using other column with same row values

I have following df -

id            score
222.0         0.0           
222.0         0.0           
222.0         1.0           
222.0         0.0           
222.0         1.0           
222.0         1.0           
222.0         1.0           
222.0         0.0           
222.0         1.0           
222.0        -1.0           
416.0         0.0           
416.0         0.0           
416.0         2.0           
416.0         0.0           
416.0         1.0           
416.0         0.0           
416.0         1.0           
416.0         1.0           
416.0         0.0           
416.0         0.0           
895.0         1.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0           
895.0         0.0

I want to calculate mode for same value of id for score column. Something Like this-

id            score
222.0         1.0           
416.0         0.0           
895.0         0.0  

My tried like this-

df['score'] = df.mode()['score']

But I am getting following output -

id            score
222.0         0.0           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
222.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN          
416.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN           
416.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN           
895.0         NaN

What is wrong here?

Upvotes: 1

Views: 945

Answers (2)

Zero
Zero

Reputation: 76917

You could also use

In [79]: df.groupby('id').agg({'score': lambda x: x.value_counts().index[0]}).reset_index()
Out[79]:
      id  score
0  222.0    1.0
1  416.0    0.0
2  895.0    0.0

Or, use

In [80]: from scipy.stats.mstats import mode

In [81]: df.groupby('id').agg({'score': lambda x: mode(x)[0]}).reset_index()
Out[81]:
      id  score
0  222.0    1.0
1  416.0    0.0
2  895.0    0.0

Upvotes: 1

Ami Tavory
Ami Tavory

Reputation: 76297

Group the scores by ids, and apply mode to each:

>>> df.score.groupby(df['id']).apply(lambda g: g.mode()).reset_index()[['id', 'score']]
      id    score
0   222.0   1.0
1   416.0   0.0
2   895.0   0.0

Upvotes: 1

Related Questions