Reputation: 3109
I have a Pandas dataframe that looks like this:
Index, cat1, cat2, cat3
1,0.3,0.1,0.4
2,0.5,0.1,0.2
3,0.1,0.4,0.3
I want to create a fifth column "max_cat" with the column name of the category with the highest value like so:
Index, cat1, cat2, cat3, max_cat
1,0.3,0.1,0.4, cat3
2,0.5,0.1,0.2, cat1
3,0.1,0.4,0.3, cat2
How would I go about achieving that, preferably using Pandas?
Here is my code:
import pandas as pd
from io import StringIO
data = StringIO("""
Index, cat1, cat2, cat3
1,0.3,0.1,0.4
2,0.5,0.1,0.2
3,0.1,0.4,0.3
""")
df = pd.read_csv(data, skiprows=1, header=0, names=["cat1","cat2","cat3"])
Upvotes: 2
Views: 1461
Reputation: 77951
if you are not worrying about ties, then you can use idxmax
, with axis=1
on the cati
columns:
>>> df['max_cat'] = df[['cat1', 'cat2', 'cat3']].idxmax(axis=1)
>>> df
Index cat1 cat2 cat3 max_cat
0 1 0.3 0.1 0.4 cat3
1 2 0.5 0.1 0.2 cat1
2 3 0.1 0.4 0.3 cat2
if you want to see the ties, then see this question.
Upvotes: 2