Find the most common cell in a dataframe per group of another common

Question

I have a dataset with a column id and a column lang.

>>> all_transcripts

id  lang
1   nl   
1   nl
1   fr
1   nl
2   en
2   nl
2   en
3   nl
3   nl

Now I want to create a column actual_lang that shows the most common lang per interactionId. My desired output looks as follows:

id  lang    actual_lang
1   nl      nl
1   nl      nl
1   fr      nl
1   nl      nl
2   en      en
2   nl      en
2   en      en
3   nl      nl
3   nl      nl

I have found Pandas: Find most common string per person, however here the returned value is based on two columns and the output comes per single group item, rather than added on to the dataset.

Who knows how to do this?

jezrael · Accepted Answer

Use GroupBy.transform with Series.mode and selecting first value:

df['actual_lang'] = df.groupby('id')['lang'].transform(lambda x: x.mode().iat[0])
print (df)
   id lang actual_lang
0   1   nl          nl
1   1   nl          nl
2   1   fr          nl
3   1   nl          nl
4   2   en          en
5   2   nl          en
6   2   en          en
7   3   nl          nl
8   3   nl          nl

Find the most common cell in a dataframe per group of another common

Answers (1)

Related Questions