Reputation: 229
I want to obtain the second highest value of a certain section for each row from a dataframe. How do I do this?
I have tried the following code but it doesn't work:
df.iloc[:, 5:-3].nlargest(2)(axis=1, level=2)
Is there any other way to obtain this?
Upvotes: 10
Views: 13878
Reputation: 419
One fun option is a creative of the pandas 'replace' functionality.
First, get the maximum values for each row:
max_values = values.max()
Then subtract these maximum values from your overall data frame -- then, the positions that had that maximum value will then be zeroed out.
values_with_max_pos_zeroed_out = values_for_cycle - max_values_for_cycle
Then, when you replace the zeros with nan, the max function ignores nans such that you can take that difference from the original max values, and add it back to the max values to get what the second highest values are for each row:
max_values+(values_with_max_pos_zeroed_out ).replace(0, np.nan).max()
Upvotes: 0
Reputation: 4417
Using apply with axis=1 you can find the second largest value for each row. by finding the first 2 largest and then getting the last of them
df.iloc[:, 5:-3].apply(lambda row: row.nlargest(2).values[-1],axis=1)
Example
The code below find the second largest value in each row of df.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({'Col{}'.format(i):np.random.randint(0,100,5) for i in range(5)})
In [4]: df
Out[4]:
Col0 Col1 Col2 Col3 Col4
0 82 32 14 62 90
1 62 32 74 62 72
2 31 79 22 17 3
3 42 54 66 93 50
4 13 88 6 46 69
In [5]: df.apply(lambda row: row.nlargest(2).values[-1],axis=1)
Out[5]:
0 82
1 72
2 31
3 66
4 69
dtype: int64
Upvotes: 13
Reputation: 862481
I think you need sorting per rows and then select:
a = np.sort(df.iloc[:, 5:-3], axis=1)[:, -2]
Sample:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(10,10)))
print (df)
0 1 2 3 4 5 6 7 8 9
0 8 8 3 7 7 0 4 2 5 2
1 2 2 1 0 8 4 0 9 6 2
2 4 1 5 3 4 4 3 7 1 1
3 7 7 0 2 9 9 3 2 5 8
4 1 0 7 6 2 0 8 2 5 1
5 8 1 5 4 2 8 3 5 0 9
6 3 6 3 4 7 6 3 9 0 4
7 4 5 7 6 6 2 4 2 7 1
8 6 6 0 7 2 3 5 4 2 4
9 3 7 9 0 0 5 9 6 6 5
print (df.iloc[:, 5:-3])
5 6
0 0 4
1 4 0
2 4 3
3 9 3
4 0 8
5 8 3
6 6 3
7 2 4
8 3 5
9 5 9
a = np.sort(df.iloc[:, 5:-3], axis=1)[:, -2]
print (a)
[0 0 3 3 0 3 3 2 3 5]
If need both values:
a = df.iloc[:, 5:-3].values
b = pd.DataFrame(a[np.arange(len(a))[:, None], np.argsort(a, axis=1)])
print (b)
0 1
0 0 4
1 0 4
2 3 4
3 3 9
4 0 8
5 3 8
6 3 6
7 2 4
8 3 5
9 5 9
Upvotes: 5
Reputation: 1396
You need to sort your dataframe with numpy.sort()
and then get the second value.
import numpy as np
second = np.sort(df.iloc[:, 5:-3], axis=1)[:, 1]
Upvotes: -1