Reputation: 1722
I have a dataset with a column id
and a column lang
.
>>> all_transcripts
id lang
1 nl
1 nl
1 fr
1 nl
2 en
2 nl
2 en
3 nl
3 nl
Now I want to create a column actual_lang
that shows the most common lang
per interactionId
. My desired output looks as follows:
id lang actual_lang
1 nl nl
1 nl nl
1 fr nl
1 nl nl
2 en en
2 nl en
2 en en
3 nl nl
3 nl nl
I have found Pandas: Find most common string per person, however here the returned value is based on two columns and the output comes per single group item, rather than added on to the dataset.
Who knows how to do this?
Upvotes: 1
Views: 85
Reputation: 862511
Use GroupBy.transform
with Series.mode
and selecting first value:
df['actual_lang'] = df.groupby('id')['lang'].transform(lambda x: x.mode().iat[0])
print (df)
id lang actual_lang
0 1 nl nl
1 1 nl nl
2 1 fr nl
3 1 nl nl
4 2 en en
5 2 nl en
6 2 en en
7 3 nl nl
8 3 nl nl
Upvotes: 2