Gautham Kanthasamy
Gautham Kanthasamy

Reputation: 229

How do I obtain the second highest value in a row?

I want to obtain the second highest value of a certain section for each row from a dataframe. How do I do this?

I have tried the following code but it doesn't work:

df.iloc[:, 5:-3].nlargest(2)(axis=1, level=2)

Is there any other way to obtain this?

Upvotes: 10

Views: 13878

Answers (4)

ekofman
ekofman

Reputation: 419

One fun option is a creative of the pandas 'replace' functionality.

First, get the maximum values for each row:

max_values = values.max()

Then subtract these maximum values from your overall data frame -- then, the positions that had that maximum value will then be zeroed out.

values_with_max_pos_zeroed_out = values_for_cycle - max_values_for_cycle

Then, when you replace the zeros with nan, the max function ignores nans such that you can take that difference from the original max values, and add it back to the max values to get what the second highest values are for each row:

max_values+(values_with_max_pos_zeroed_out ).replace(0, np.nan).max()

Upvotes: 0

sgDysregulation
sgDysregulation

Reputation: 4417

Using apply with axis=1 you can find the second largest value for each row. by finding the first 2 largest and then getting the last of them

df.iloc[:, 5:-3].apply(lambda row: row.nlargest(2).values[-1],axis=1)

Example

The code below find the second largest value in each row of df.

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame({'Col{}'.format(i):np.random.randint(0,100,5) for i in range(5)})

In [4]: df
Out[4]: 
   Col0  Col1  Col2  Col3  Col4
0    82    32    14    62    90
1    62    32    74    62    72
2    31    79    22    17     3
3    42    54    66    93    50
4    13    88     6    46    69

In [5]: df.apply(lambda row: row.nlargest(2).values[-1],axis=1)
Out[5]: 
0    82
1    72
2    31
3    66
4    69
dtype: int64

Upvotes: 13

jezrael
jezrael

Reputation: 862481

I think you need sorting per rows and then select:

a = np.sort(df.iloc[:, 5:-3], axis=1)[:, -2]

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(10,10)))
print (df)
   0  1  2  3  4  5  6  7  8  9
0  8  8  3  7  7  0  4  2  5  2
1  2  2  1  0  8  4  0  9  6  2
2  4  1  5  3  4  4  3  7  1  1
3  7  7  0  2  9  9  3  2  5  8
4  1  0  7  6  2  0  8  2  5  1
5  8  1  5  4  2  8  3  5  0  9
6  3  6  3  4  7  6  3  9  0  4
7  4  5  7  6  6  2  4  2  7  1
8  6  6  0  7  2  3  5  4  2  4
9  3  7  9  0  0  5  9  6  6  5

print (df.iloc[:, 5:-3])
   5  6
0  0  4
1  4  0
2  4  3
3  9  3
4  0  8
5  8  3
6  6  3
7  2  4
8  3  5
9  5  9

a = np.sort(df.iloc[:, 5:-3], axis=1)[:, -2]
print (a)
[0 0 3 3 0 3 3 2 3 5]

If need both values:

a = df.iloc[:, 5:-3].values
b = pd.DataFrame(a[np.arange(len(a))[:, None], np.argsort(a, axis=1)])
print (b)
   0  1
0  0  4
1  0  4
2  3  4
3  3  9
4  0  8
5  3  8
6  3  6
7  2  4
8  3  5
9  5  9

Upvotes: 5

RdlP
RdlP

Reputation: 1396

You need to sort your dataframe with numpy.sort() and then get the second value.

import numpy as np
second = np.sort(df.iloc[:, 5:-3], axis=1)[:, 1]

Upvotes: -1

Related Questions