Reputation: 83
I have a Dataframe df
:
col_1
0 51.0
1 52.0
2 51.5
3 51.5
4 53.0
5 54.0
6 52.0
7 53.0
8 50.5
9 50.0
10 52.0
I would like to count the number of rows backwards until a value greater than the current value is found, or until the first row is reached. For example, the result would be:
col_1 count
0 51.0 0
1 52.0 1
2 51.5 0
3 51.5 1
4 53.0 4
5 54.0 5
6 52.0 0
7 53.0 1
8 50.5 0
9 50.0 0
10 52.0 2
I am basically trying to find the value of n
in df['col_1'].rolling(n).max()
for each value of the series. How can I do this? Thank you in advance.
Upvotes: 3
Views: 1081
Reputation: 305
I came up with this, it may not be very effective way to do it, but it seems to work:
dr = df.iloc[::-1,0] # col_1 reversed
dr = dr.reset_index(drop=True)
count = []
for i, v in dr.iteritems():
d = dr.iloc[i:]
try:
ix = d[d > v].index[0] - i
except IndexError:
ix = len(d)
count.append(ix - 1)
df["count"] = count[::-1]
print(df)
col_1 count
0 51.0 0
1 52.0 1
2 51.5 0
3 51.5 1
4 53.0 4
5 54.0 5
6 52.0 0
7 53.0 1
8 50.5 0
9 50.0 0
10 52.0 2
Upvotes: 2
Reputation: 71689
a = df['col_1'].values[::-1]
m = np.triu(a[:, None] < a)
i = m.argmax(1)
i[~m.any(1)] = len(m)
df['count'] = (i - range(len(m)) - 1)[::-1]
col_1
and obtain a numpy view>>> a
array([52. , 50. , 50.5, 53. , 52. , 54. , 53. , 51.5, 51.5, 52. , 51. ])
a
with itself>>> a[:, None] < a
[[False False False True False True True False False False False]
[ True False True True True True True True True True True]
[ True False False True True True True True True True True]
[False False False False False True False False False False False]
[False False False True False True True False False False False]
[False False False False False False False False False False False]
[False False False False False True False False False False False]
[ True False False True True True True False False True False]
[ True False False True True True True False False True False]
[False False False True False True True False False False False]
[ True False False True True True True True True True False]]
triu
(upper triangular) mask the values to the left of main diagonal as we are only concerned with the values that come after the current value. Think of this as the main diagonal representing the current value.>>> np.triu(a[:, None] < a)
[[False False False True False True True False False False False]
[False False True True True True True True True True True]
[False False False True True True True True True True True]
[False False False False False True False False False False False]
[False False False False False True True False False False False]
[False False False False False False False False False False False]
[False False False False False False False False False False False]
[False False False False False False False False False True False]
[False False False False False False False False False True False]
[False False False False False False False False False False False]
[False False False False False False False False False False False]]
>>> i = m.argmax(1)
>>> i[~m.any(1)] = len(m)
>>> i
array([ 3, 2, 3, 5, 5, 11, 11, 9, 9, 11, 11])
>>> df['count'] = (i - range(len(m)) - 1)[::-1]
>>> df
col_1 count
0 51.0 0
1 52.0 1
2 51.5 0
3 51.5 1
4 53.0 4
5 54.0 5
6 52.0 0
7 53.0 1
8 50.5 0
9 50.0 0
10 52.0 2
Upvotes: 7