Reputation: 51
How to find the highest count of sequential occurrences such as same number, increasing value or decreasing value in the same column.
so given something like:
h_diff l_diff monotonic
timestamp
2000-01-18 NaN NaN NaN
2000-01-19 2.75 2.93 1.0
2000-01-20 12.75 10.13 1.0
2000-01-21 -7.25 -3.31 0.0
2000-01-24 -1.50 -5.07 0.0
2000-01-25 0.37 -2.75 1.0
2000-01-26 1.07 7.38 1.0
2000-01-27 -1.19 -2.75 0.0
2000-01-28 -2.13 -6.38 0.0
2000-01-31 -7.00 -6.12 0.0
the highest value of monotonicity for positive values in h_diff is 2 and for negative values it is 3. same for l_diff. so given a rolling of 10 or n, how would i find the highest monotonic count while still being able to change the window size dynamically?
this gives me the 1.0 value for the monotonic column: lambda x: np.all(np.diff(x) > 0) and lambda x: np.count_nonzero(np.diff(x) > 0) will count the total count of 1.0 for the window but what i am trying to find is the longest run in a series of a given window.
What I am hoping for is something like:
h_diff l_diff monotonic
timestamp
2000-01-18 NaN NaN NaN
2000-01-19 2.75 2.93 1.0
2000-01-20 12.75 10.13 2.0
2000-01-21 -7.25 -3.31 0.0
2000-01-24 -1.50 -5.07 0.0
2000-01-25 0.37 -2.75 1.0
2000-01-26 1.07 7.38 2.0
2000-01-27 1.19 -2.75 3.0
2000-01-28 2.13 -6.38 4.0
2000-01-31 -7.00 -6.12 0.0
Upvotes: 1
Views: 320
Reputation: 30920
use GroupBy.cumcount
+ Series.where
.
Initial DataFrame
h_diff l_diff
timestamp
2000-01-18 NaN NaN
2000-01-19 2.75 2.93
2000-01-20 12.75 10.13
2000-01-21 -7.25 -3.31
2000-01-24 -1.50 -5.07
2000-01-25 0.37 -2.75
2000-01-26 1.07 7.38
2000-01-27 1.19 -2.75
2000-01-28 2.13 -6.38
2000-01-31 -7.00 -6.12
h = df['h_diff'].gt(0)
#h = np.sign(df['h_diff'])
df['monotonic_h']=h.groupby(h.ne(h.shift()).cumsum()).cumcount().add(1).where(h,0)
print(df)
h_diff l_diff monotonic_h
timestamp
2000-01-18 NaN NaN 0
2000-01-19 2.75 2.93 1
2000-01-20 12.75 10.13 2
2000-01-21 -7.25 -3.31 0
2000-01-24 -1.50 -5.07 0
2000-01-25 0.37 -2.75 1
2000-01-26 1.07 7.38 2
2000-01-27 1.19 -2.75 3
2000-01-28 2.13 -6.38 4
2000-01-31 -7.00 -6.12 0
df['monotonic_h'].max()
#4
Detail
h.ne(h.shift()).cumsum()
timestamp
2000-01-18 1
2000-01-19 2
2000-01-20 2
2000-01-21 3
2000-01-24 3
2000-01-25 4
2000-01-26 4
2000-01-27 4
2000-01-28 4
2000-01-31 5
Name: h_diff, dtype: int64
UPDATE
df = df.join( h.groupby(h.ne(h.shift()).cumsum()).cumcount().add(1)
.to_frame('values')
.assign(monotic = np.where(h,'monotic_h_greater_0',
'monotic_h_not_greater_0'),
index = lambda x: x.index)
.where(df['h_diff'].notna())
.pivot_table(columns = 'monotic',
index = 'index',
values = 'values',
fill_value=0) )
print(df)
h_diff l_diff monotic_h_greater_0 monotic_h_not_greater_0
timestamp
2000-01-18 NaN NaN NaN NaN
2000-01-19 2.75 2.93 1.0 0.0
2000-01-20 12.75 10.13 2.0 0.0
2000-01-21 -7.25 -3.31 0.0 1.0
2000-01-24 -1.50 -5.07 0.0 2.0
2000-01-25 0.37 -2.75 1.0 0.0
2000-01-26 1.07 7.38 2.0 0.0
2000-01-27 1.19 -2.75 3.0 0.0
2000-01-28 2.13 -6.38 4.0 0.0
2000-01-31 -7.00 -6.12 0.0 1.0
Upvotes: 1
Reputation: 8033
The code below should do the trick of finding the sequential occurrences of positive or negative number. Code below is for column h_diff
df1[df1.h_diff.gt(0)].index.to_series().diff().ne(1).cumsum().value_counts().max() #sequential occurrences greater than 0
df1[df1.h_diff.lt(0)].index.to_series().diff().ne(1).cumsum().value_counts().max() #sequential occurrences less than 0
Upvotes: 1