Reputation: 442
I have following df
-
id score
222.0 0.0
222.0 0.0
222.0 1.0
222.0 0.0
222.0 1.0
222.0 1.0
222.0 1.0
222.0 0.0
222.0 1.0
222.0 -1.0
416.0 0.0
416.0 0.0
416.0 2.0
416.0 0.0
416.0 1.0
416.0 0.0
416.0 1.0
416.0 1.0
416.0 0.0
416.0 0.0
895.0 1.0
895.0 0.0
895.0 0.0
895.0 0.0
895.0 0.0
895.0 0.0
895.0 0.0
895.0 0.0
895.0 0.0
895.0 0.0
I want to calculate mode for same value of id
for score
column. Something Like this-
id score
222.0 1.0
416.0 0.0
895.0 0.0
My tried like this-
df['score'] = df.mode()['score']
But I am getting following output -
id score
222.0 0.0
222.0 NaN
222.0 NaN
222.0 NaN
222.0 NaN
222.0 NaN
222.0 NaN
222.0 NaN
222.0 NaN
222.0 NaN
416.0 NaN
416.0 NaN
416.0 NaN
416.0 NaN
416.0 NaN
416.0 NaN
416.0 NaN
416.0 NaN
416.0 NaN
416.0 NaN
895.0 NaN
895.0 NaN
895.0 NaN
895.0 NaN
895.0 NaN
895.0 NaN
895.0 NaN
895.0 NaN
895.0 NaN
895.0 NaN
What is wrong here?
Upvotes: 1
Views: 945
Reputation: 76917
You could also use
In [79]: df.groupby('id').agg({'score': lambda x: x.value_counts().index[0]}).reset_index()
Out[79]:
id score
0 222.0 1.0
1 416.0 0.0
2 895.0 0.0
Or, use
In [80]: from scipy.stats.mstats import mode
In [81]: df.groupby('id').agg({'score': lambda x: mode(x)[0]}).reset_index()
Out[81]:
id score
0 222.0 1.0
1 416.0 0.0
2 895.0 0.0
Upvotes: 1
Reputation: 76297
Group the scores by ids, and apply mode to each:
>>> df.score.groupby(df['id']).apply(lambda g: g.mode()).reset_index()[['id', 'score']]
id score
0 222.0 1.0
1 416.0 0.0
2 895.0 0.0
Upvotes: 1