Reputation: 468
I have a pandas dataframe that looks like so:
datetime Online TEST
61 2018-03-03 True False
62 2018-03-04 True False
63 2018-03-05 True False
64 2018-03-06 True False
65 2018-03-07 True False
66 2018-03-08 True False
67 2018-03-09 True False
68 2018-03-10 True False
69 2018-03-11 False False
70 2018-03-12 False False
I need to check that for each False
in the TEST
column, that within a date range of 7 days that there is a False
in the Online
column. For example, on 2018-03-03, since TEST
is False, I would want to check all plus or minus 7 days (ie plus or minus timedelta(days = 7)
) for False values in the Online
column. So since there are no False Online
values within a 7 day time frame, then we would return a False. On the other hand, consider the date 2018-03-09, where Online
is True and TEST
is False. Since there is a False in Online
on the day 2018-03-11, I need to return a boolean True value saying that there was a False within my 7 day time range.
I can achieve this using some slow and ugly looping mechanisms (ie go through each row using DataFrame.iterrows()
, check if TEST
is false, then pull the time window of plus or minus 7 days to see if Online
also has a corresponding False
value. But I would ideally like to have something snazzier and faster. For a visual, this is what I need my final dataframe to look like:
datetime Online TEST Check
61 2018-03-03 True False False
62 2018-03-04 True False True
63 2018-03-05 True False True
64 2018-03-06 True False True
65 2018-03-07 True False True
66 2018-03-08 True False True
67 2018-03-09 True False True
68 2018-03-10 True False True
69 2018-03-11 False False True
70 2018-03-12 False False True
Any ideas out there? Thanks in advance!
Upvotes: 2
Views: 335
Reputation: 803
Building upon great @piRSquared comments (I didn't even know about the rolling
method, it seems very useful!), you can use
check = ~(df.TEST + df.Online.rolling(15, center=True, min_periods=1).apply(np.prod).eq(1))
The second summand creates a Series object in which every element is a boolean indicating if there is not any False
value in a window of size 15; this is achieved by multiplying (NumPy's prod
function) all the values inside this rolling window.
The sum (with the logical inverse operator ~
) is what compares booleans so we only get True
in the Check
series if there are two False
in both columns (element-wise, of course).
Hope it helps.
Upvotes: 1