sergey_208
sergey_208

Reputation: 654

find the index that gives the second largest difference between two lists in Python

I would like to identify the index between two equal length lists that gives the second maximum absolute value of the difference between each row.

import random
import pandas as pd
random.seed(2)
l1 = pd.DataFrame([random.randrange(100) for _ in range(10)])
l2 = pd.DataFrame([random.randrange(100) for _ in range(10)])

l1-l2

0
0   -20
1   -66
2   6
3   -28
4   -66
5   74
6   30
7   -42
8   -18
9   -15

Now, I can use idxmax() to get the index giving me the largest absolute value difference, which is row 5. My question is how can I get the index giving the second largest difference value?

(l1 - l2).abs().idxmax()
0    5
dtype: int64

Upvotes: 0

Views: 63

Answers (2)

lorenzo
lorenzo

Reputation: 13

You could identify the largest absolute difference with idxmax() then remove it from the list via its index and use idxmax() again, which then would give you the index of the second-largest absolute difference.

l = (l1 - l2)
largest_index = l.abs().idxmax()
del l[largest_index]
l.idxmax()

Since it is not quite clear if you want the index of the second-largest absolute difference in the original (l1 - l2) this option will achieve this.

l = (l1 - l2)
largest_index = l.abs().idxmax()
l[largest_index] = 0
l.idxmax()

By setting the larges absolute difference to zero, a second call will give you the index of the second-largest absolute difference, but not change the size of (l1 - l2) nor alter its order.

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150745

Option 1: The easy way: sort, then slice (complexity O(n log n))

(l1 - l2).abs().sort_values([0], ascending=False).index[1]

Option 2: nlargest, then idxmin (complexity O(n) ):

(l1 - l2).abs().nlargest(2, columns=[0]).idxmin()

Note your data actually have two rows with value 66 so you might get random answer between 1 and 4.

Upvotes: 2

Related Questions