Reputation:
In a data frame, I want to create another column which is outputs the most frequent value coming from different columns in a row.
A B C D
foo bar baz foo
egg bacon egg egg
bacon egg foo baz
The "E" column must output frequent value from a row like
E
foo
egg
How can I do it in Python?
Upvotes: 2
Views: 1581
Reputation: 888
Recreating your problem with:
df = pd.DataFrame(
{
'A' : ['foo', 'egg', 'bacon'],
'B' : ['bar', 'bacon', 'egg'],
'C' : ['baz', 'egg', 'foo'],
'D' : ['foo', 'egg', 'baz']
}
)
And solving the problem with
df['E'] = df.mode(axis=1)[0]
Output:
A B C D E
0 foo bar baz foo foo
1 egg bacon egg egg egg
2 bacon egg foo baz bacon
What happens if there is no single most frequent element?
df.mode(axis=1)
0 1 2 3
0 foo NaN NaN NaN
1 egg NaN NaN NaN
2 bacon baz egg foo
As you can see when there is a tie on being most frequent it returns the values in the most frequent set. If I swap the values foo for egg and baz for bacon in columns C and D, respectively, we get the following result:
0 1
0 foo NaN
1 egg NaN
2 bacon egg
As you can see, now the result set is only two elements, which means that the tie is between bacon and egg.
How do I detect ties?
Let us work with the dataset not containing the column D.
df
A B C
0 foo bar baz
1 egg bacon egg
2 bacon egg foo
df_m = df.mode(axis=1)
df_m
0 1 2
0 bar baz foo
1 egg NaN NaN
2 bacon egg foo
df['D'] = df_m[0]
A B C D
0 foo bar baz bar
1 egg bacon egg egg
2 bacon egg foo bacon
We can utilize the notna() method which pandas provide to create a mask to check which rows are not containing a NaN value, i.e which rows are in a tie.
First, we must drop the first column which always has a value.
df_m = df_m.drop(columns=0)
Then we need to transform the dataframe using another method .T, and check for any rows not containing NaNs.
df_mask = df_m.T.notna().any()
df_mask
0 False
1 False
2 True
dtype: bool
Now we have a pandas series of booleans. We can use this mask to overwrite the column from before.
df['D'][df_mask] = df['A'][df_mask]
A B C D
0 foo bar baz foo
1 egg bacon egg egg
2 bacon egg foo bacon
Upvotes: 3