Tanakorn Taweepoka
Tanakorn Taweepoka

Reputation: 197

How to get the number of previous nearest row which is larger than or equal current row in pandas?

Here is my table.

Day     Close
1       1
2       2
3       6
4       4
5       5
6       4
7       5

And I want out put like this.

Day     Close   Previous_nearest_larger_row
1       1       Nan
2       2       0
3       6       0
4       4       1
5       5       2
6       4       1
7       5       2

If the current row is the largest, I want the result is 0.

Upvotes: 0

Views: 172

Answers (1)

Corralien
Corralien

Reputation: 120429

The code uses conditions and loops so it's probably non-optimal for a large dataframe but it works!

sr = df["Close"]
distance = []

for idx, val in sr.iteritems():
    msk = sr[idx::-1] - val >= 0
    if msk.sum():
        dist = idx - msk.idxmax()
    else:
        dist = 0
    distance.append(dist)

df["Previous_nearest_larger_row"] = distance
>>> df
   Day  Close  Previous_nearest_larger_row
0    1      1                            0
1    2      2                            0
2    3      6                            0
3    4      4                            1
4    5      5                            2
5    6      4                            1
6    7      5                            2

Example for idx=4 and val=5

Slice the series from the index excluded ([1:]) to the beginning (-1), then subtract the current value and finally, replace values where the condition is true (>= 0):

>>> sr[idx::-1][1:]
3    4  # distance from idx == 1
2    6  # distance from idx == 2
1    2  # distance from idx == 3
0    1  # distance from idx == 4
Name: Close, dtype: int64

>>> sr[idx::-1][1:] - val
3   -1
2    1  # 6 - val >= 0 
1   -3
0   -4
Name: Close, dtype: int64

>>> sr[idx::-1][1:] - val >= 0
3    False
2     True  # idx - 2 = 2 <- distance
1    False
0    False
Name: Close, dtype: bool

If there is at least a True value (msk.sum()), compute the distance between current index and the index of the first True value (msk.idxmax()):

>>> idx - msk.idxmax()
2

NumPy

A bit more complex than the previous solution but maybe more scalable:

v = df["Close"].astype(float).values
s = v.size
i = np.arange(s)

v = np.full((s, s), v)
v = v - v.T
v[np.triu_indices(s)] = np.nan

r = i - (s - np.argmax(v[:, ::-1] >= 0, axis=1) - 1)
r[r < 0] = 0
>>> r
array([0, 0, 0, 1, 2, 1, 2])

Upvotes: 2

Related Questions