Marcel Mendes Reis
Marcel Mendes Reis

Reputation: 107

What is the best way to check if the last rows of a pandas dataframe meet a condition?

I got stuck trying to create a new column that is a check column based on the 'signal' column. If the last five rows(including the last) are 1, it would return 1, If the last five rows(including the last) are 0, it would return 0, everything else would be the last value of check, like this:

I have the following data frame:

       signal
index        
0           1
1           1
2           1
3           1
4           1
5           1
6           0
7           0
8           0
9           0
10          0
11          0
12          0
13          1
14          0
15          1
16          1
17          1
18          1
19          1

I'd like something like this:

       signal  check
index                
0           1       1
1           1       1
2           1       1
3           1       1
4           1       1
5           1       1
6           0       1
7           0       1
8           0       1
9           0       1
10          0       0
11          0       0
12          0       0
13          1       0
14          0       0
15          1       0
16          1       0
17          1       0
18          1       0
19          1       1

I would appreciate any kind of help!

Thank you!

Upvotes: 2

Views: 1818

Answers (2)

Andy L.
Andy L.

Reputation: 25239

Try rolling.sum to slice df and ffill, bfill

df['check'] = df[df.rolling(5).sum().isin([0, 5])].ffill().bfill()

Out[540]:
       signal  check
index
0           1    1.0
1           1    1.0
2           1    1.0
3           1    1.0
4           1    1.0
5           1    1.0
6           0    1.0
7           0    1.0
8           0    1.0
9           0    1.0
10          0    0.0
11          0    0.0
12          0    0.0
13          1    0.0
14          0    0.0
15          1    0.0
16          1    0.0
17          1    0.0
18          1    0.0
19          1    1.0

If you want check as integer, just chain addition astype(int)

df['check'] = df[df.rolling(5).sum().isin([0, 5])].ffill().bfill().astype(int)

Upvotes: 3

wingedsubmariner
wingedsubmariner

Reputation: 13667

You want to use a rolling window over your dataframe, followed by fillna:

def allSame(x):
    if (x == 1).all():
        return 1.0
    elif (x == 0).all():
        return 0.0
    else:
        return np.nan

df['signal'] = df.rolling(5).apply(allSame, raw=False).fillna(method="ffill")

rolling returns a rolling window object over a number of elements (5 in this case). The window object is similar to a dataframe, but instead of having rows it has windows over the original dataframe's rows. We can use its apply method to convert each rolling window to a value, converting the rolling window object to a dataframe. The apply method takes a function that can convert from an ndarray to an appropriate output value.

Here we pass to apply a function that returns 1 or 0 if the 5 rows in the window are all 1 or 0 respectively, and otherwise returns NaN. As a result we get a new dataframe with values that are either 1, 0, or NaN. We then use fillna on this dataframe to overwrite the NaN values with the first preceeding 1 or 0 value. Finally, we merge the resulting dataframe back into the original dataframe, creating the "signal" column.

Upvotes: 3

Related Questions