obewanjacobi
obewanjacobi

Reputation: 468

Compare Boolean variables within Time Window of Pandas Dataframe

I have a pandas dataframe that looks like so:

   datetime         Online   TEST
61 2018-03-03       True   False
62 2018-03-04       True   False
63 2018-03-05       True   False
64 2018-03-06       True   False
65 2018-03-07       True   False
66 2018-03-08       True   False
67 2018-03-09       True   False
68 2018-03-10       True   False
69 2018-03-11       False  False
70 2018-03-12       False  False

I need to check that for each False in the TEST column, that within a date range of 7 days that there is a False in the Online column. For example, on 2018-03-03, since TEST is False, I would want to check all plus or minus 7 days (ie plus or minus timedelta(days = 7)) for False values in the Online column. So since there are no False Online values within a 7 day time frame, then we would return a False. On the other hand, consider the date 2018-03-09, where Online is True and TEST is False. Since there is a False in Online on the day 2018-03-11, I need to return a boolean True value saying that there was a False within my 7 day time range.

I can achieve this using some slow and ugly looping mechanisms (ie go through each row using DataFrame.iterrows(), check if TEST is false, then pull the time window of plus or minus 7 days to see if Online also has a corresponding False value. But I would ideally like to have something snazzier and faster. For a visual, this is what I need my final dataframe to look like:

   datetime         Online   TEST   Check
61 2018-03-03       True   False    False
62 2018-03-04       True   False    True
63 2018-03-05       True   False    True
64 2018-03-06       True   False    True
65 2018-03-07       True   False    True
66 2018-03-08       True   False    True
67 2018-03-09       True   False    True
68 2018-03-10       True   False    True
69 2018-03-11       False  False    True
70 2018-03-12       False  False    True

Any ideas out there? Thanks in advance!

Upvotes: 2

Views: 335

Answers (1)

Lith
Lith

Reputation: 803

Building upon great @piRSquared comments (I didn't even know about the rolling method, it seems very useful!), you can use

check = ~(df.TEST + df.Online.rolling(15, center=True, min_periods=1).apply(np.prod).eq(1))

The second summand creates a Series object in which every element is a boolean indicating if there is not any False value in a window of size 15; this is achieved by multiplying (NumPy's prod function) all the values inside this rolling window.

The sum (with the logical inverse operator ~) is what compares booleans so we only get True in the Check series if there are two False in both columns (element-wise, of course).

Hope it helps.

Upvotes: 1

Related Questions