Pandas/numpy: rolling mean reset when row value is a multiple of it, with minimum size

Question

I have a dataframe with multiple columns. For each column, I want to return an indicator (1 or -1), changing if the column's rolling mean resets when the current row value is a defined multiple of the rolling mean value. An added complication is that I don't want the reset to occur until the rolling window has reached a minimum length, following the previous reset.

Here's an example with multiple 2, minimum length 4, and a starting indicator value of 1

Starting dataframe:

df = pd.DataFrame(
    {
        "A": [0.1, 0.1, 0.15, 0.1, 0.1, 0.7, 0.1, 0.1, 0.5, 1, 0.1, 0.1],
        "B": [0.1, 0.1, 0.4, 0.1, 0.8, 0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.1],
    },
    index=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
)
print(df)

       A    B
0   0.10  0.1
1   0.10  0.1
2   0.15  0.4
3   0.10  0.1
4   0.10  0.8
5   0.70  0.1
6   0.10  0.1
7   0.10  0.1
8   0.50  0.1
9   1.00  0.1
10  0.10  0.9
11  0.10  0.1

I would like to obtain:

Looking at column A for example:

first reset happens on row 5, as 0.7 >= 2 * mean(rows 0 to 5), and we have observed 6 rows
nothing happens on row 8 as the new rolling mean only has 3 observations
next reset happens on row 9, as 1 >= 2 * mean(rows 6 to 9), and we have observed 4 rows

I have looked at various posts, the closest being the below, but none seems to really address that problem fully:

Andrej Kesely · Accepted Answer

You can try:

def rolling_mean(col, start=1, multiple=2, min_count=4):
    curr = start
    num_obs = 0
    acc = 0

    for v in col:
        acc += v
        num_obs += 1

        if num_obs < min_count:
            yield curr
            continue

        m = acc / num_obs
        if v >= multiple * m:
            curr *= -1
            num_obs = 0
            acc = 0

        yield curr


df["new_A"] = list(rolling_mean(df["A"]))
df["new_B"] = list(rolling_mean(df["B"]))

print(df)

Prints:

       A    B  new_A  new_B
0   0.10  0.1      1      1
1   0.10  0.1      1      1
2   0.15  0.4      1      1
3   0.10  0.1      1      1
4   0.10  0.8      1     -1
5   0.70  0.1     -1     -1
6   0.10  0.1     -1     -1
7   0.10  0.1     -1     -1
8   0.50  0.1     -1     -1
9   1.00  0.1      1     -1
10  0.10  0.9      1      1
11  0.10  0.1      1      1

You can speed-up the computation using numba:

from numba import njit


@njit
def rolling_mean_numba(col, start=1, multiple=2, min_count=4):
    curr = start
    num_obs = 0
    acc = 0

    out = np.empty_like(col, dtype=np.int8)

    for i, v in enumerate(col):
        acc += v
        num_obs += 1

        if num_obs < min_count:
            out[i] = curr
            continue

        m = acc / num_obs
        if v >= multiple * m:
            curr *= -1
            num_obs = 0
            acc = 0

        out[i] = curr

    return out

df["new_A"] = rolling_mean_numba(df["A"].values)
df["new_B"] = rolling_mean_numba(df["B"].values)

print(df)

Pandas/numpy: rolling mean reset when row value is a multiple of it, with minimum size

Answers (1)

Related Questions