Emil
Emil

Reputation: 1722

Find the most common cell in a dataframe per group of another common

I have a dataset with a column id and a column lang.

>>> all_transcripts

id  lang
1   nl   
1   nl
1   fr
1   nl
2   en
2   nl
2   en
3   nl
3   nl

Now I want to create a column actual_lang that shows the most common lang per interactionId. My desired output looks as follows:

id  lang    actual_lang
1   nl      nl
1   nl      nl
1   fr      nl
1   nl      nl
2   en      en
2   nl      en
2   en      en
3   nl      nl
3   nl      nl

I have found Pandas: Find most common string per person, however here the returned value is based on two columns and the output comes per single group item, rather than added on to the dataset.

Who knows how to do this?

Upvotes: 1

Views: 85

Answers (1)

jezrael
jezrael

Reputation: 862511

Use GroupBy.transform with Series.mode and selecting first value:

df['actual_lang'] = df.groupby('id')['lang'].transform(lambda x: x.mode().iat[0])
print (df)
   id lang actual_lang
0   1   nl          nl
1   1   nl          nl
2   1   fr          nl
3   1   nl          nl
4   2   en          en
5   2   nl          en
6   2   en          en
7   3   nl          nl
8   3   nl          nl

Upvotes: 2

Related Questions