Reputation: 12935

Find the column name which has the maximum value for each row

I have a DataFrame like this one:

Communications and Search   Business    General Lifestyle
0   0.745763    0.050847    0.118644    0.084746
0   0.333333    0.000000    0.583333    0.083333
0   0.617021    0.042553    0.297872    0.042553
0   0.435897    0.000000    0.410256    0.153846
0   0.358974    0.076923    0.410256    0.153846

I want to create a new column comprised of the column labels of each row’s maximum value. The desired output is like this:

Communications and Search   Business    General Lifestyle  Max
0   0.745763    0.050847    0.118644    0.084746           Communications 
0   0.333333    0.000000    0.583333    0.083333           Business  
0   0.617021    0.042553    0.297872    0.042553           Communications 
0   0.435897    0.000000    0.410256    0.153846           Communications 
0   0.358974    0.076923    0.410256    0.153846           Business

Upvotes: 215

Answers (5)

cottontail

Reputation: 23449

Another solution is to flag the position of the maximum values of each row and get the corresponding column names. In particular, this solution works well if multiple columns contain the maximum value for some rows and you want to return all column names with the maximum value for each row:¹

Code:

# look for the max values in each row
mxs = df.eq(df.max(axis=1), axis=0)
# join the column names of the max values of each row into a single string
df['Max'] = mxs.dot(mxs.columns + ', ').str.rstrip(', ')

A slight variation: If you want to pick one column randomly when multiple columns contain the maximum value:

Code:

mxs = df.eq(df.max(axis=1), axis=0)
df['Max'] = mxs.where(mxs).stack().groupby(level=0).sample(n=1).index.get_level_values(1)

You can also do this for specific columns by selecting the columns:

# for column names of max value of each row
cols = ['Communications', 'Search', 'Business']
mxs = df[cols].eq(df[cols].max(axis=1), axis=0)
df['max among cols'] = mxs.dot(mxs.columns + ', ').str.rstrip(', ')

^{1: idxmax(1) returns only the first column name with the max value if the max value is the same for multiple columns, which may not be desirable depending on the use case. This solution generalizes idxmax(1); in particular, if the max values are unique in each row, it matches the idxmax(1) solution.}

Upvotes: 7

emada

Reputation: 11

Using numpy argmax is blazing fast. I've tested in a dataframe with 3,744,965 rows and it takes 103ms.

%timeit df.idxmax(axis=1)
7.67 s ± 28.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.columns[df.to_numpy().argmax(axis=1)]
103 ms ± 355 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Upvotes: 1

Alex Riley

Reputation: 177048

You can use idxmax with axis=1 to find the column with the greatest value on each row:

>>> df.idxmax(axis=1)
0    Communications
1          Business
2    Communications
3    Communications
4          Business
dtype: object

To create the new column 'Max', use df['Max'] = df.idxmax(axis=1).

To find the row index at which the maximum value occurs in each column, use df.idxmax() (or equivalently df.idxmax(axis=0)).

Upvotes: 308

user1718097

Reputation: 4292

And if you want to produce a column containing the name of the column with the maximum value but considering only a subset of columns then you use a variation of @ajcr's answer:

df['Max'] = df[['Communications','Business']].idxmax(axis=1)

Upvotes: 57

Zero

Reputation: 77027

You could apply on dataframe and get argmax() of each row via axis=1

In [144]: df.apply(lambda x: x.argmax(), axis=1)
Out[144]:
0    Communications
1          Business
2    Communications
3    Communications
4          Business
dtype: object

Here's a benchmark to compare how slow apply method is to idxmax() for len(df) ~ 20K

In [146]: %timeit df.apply(lambda x: x.argmax(), axis=1)
1 loops, best of 3: 479 ms per loop

In [147]: %timeit df.idxmax(axis=1)
10 loops, best of 3: 47.3 ms per loop

Upvotes: 14

Find the column name which has the maximum value for each row

Answers (5)

Related Questions