theamazinggob
theamazinggob

Reputation: 11

Pandas Rolling Change Thresholds

Let's say I have two columns, 'a' and 'b', of time series data in a Pandas dataframe. I want to create a third column that indicates if the difference between column 'a' at the current time period and column 'b' at any of the next 5 time periods increases by 8 or more before it decreases by 2 or more. Ideally I'd do this using some form of df.rolling(5).apply() and without any loops, but I keep running into challenges.

For demonstration's sake, I've written the logic out with loops but if anyone can give me some direction on how to do this more efficiently or elegantly, I'd really appreciate it. In reality the dataframe and the window will be much larger.

df = pd.DataFrame({'a':[1,2,3,4,5,6,7,8,9,10], 'b':[1,0,9,0,15,0,20,15,23,6]})
df['c'] = 0

window = 5
positive_thresh = 8
negative_thresh = -2
num_rows = df.shape[0]

for a_idx in range(num_rows):
    a_start = df.iloc[a_idx,0]
    b_roll = df.iloc[(a_idx + 1):max(a_idx + 1 + window,num_rows), 1]
    deltas = b_roll - a_start
    positives = deltas[deltas>=positive_thresh]
    negatives = deltas[deltas<=negative_thresh]
    first_pos_idx = positives.index[0] if len(positives) > 0 else num_rows
    first_neg_idx = negatives.index[0] if len(negatives) > 0 else num_rows
    
    if first_pos_idx < first_neg_idx:
        df.iloc[a_idx,2] = 1

print(df)

    a   b  c
0   1   1  1
1   2   0  0
2   3   9  0
3   4   0  1
4   5  15  0
5   6   0  1
6   7  20  1
7   8  15  1
8   9  23  0
9  10   6  0

Upvotes: 1

Views: 166

Answers (1)

fsl
fsl

Reputation: 3280

This is a mouthful to handle with masks only, but here is one approach:

from numpy.lib.stride_tricks import sliding_window_view

window = 5
n_rows = df.shape[0]

dfa = df.reindex(np.arange(df.shape[0] + window))  # Just so that the sliding view matches
b_roll = sliding_window_view(dfa.b, 5)[1:]
diff = (b_roll.T - df.a.values).T  # diff next 5 "b" rows  with current "a"

pos = (diff >= 8)
pos_idx = pos.argmax(1)
pos_idx[pos.sum(1) == 0] = n_rows  # differ first idx vs. no occurences found

neg = (diff <= -2)
neg_idx = window - neg[:, ::-1].argmax(1) - 1  # getting the last occurence col-wise
neg_idx[neg.sum(1) == 0] = 0  # differ first idx vs. no occurences found

df["c"] = (pos_idx < neg_idx).astype(int)

If you'll notice, my suggested output does not quite match yours. I believe your snippet does not fully represent your description, but I might just have misunderstood something in the logic.

Output:

    a   b  c
0   1   1  0
1   2   0  1
2   3   9  1
3   4   0  1
4   5  15  0
5   6   0  0
6   7  20  0
7   8  15  1
8   9  23  0
9  10   6  0

Upvotes: 1

Related Questions