LaGabriella
LaGabriella

Reputation: 51

Count number of a specific conditional event for unique id in a rolling window over time with pandas

I have the first DataFrame and I would like to get the second one with the new column counter.

Logic: using a rolling window (example df.rolling('1min').agg(lambda ...)), for each unique id in the window, the column "counter" is 1 if "full" or "partial" is found at least one time. Once it is found for the first time, we write 1 and we don't need to check further in the window.

For example, below, the id '2' had two times the 'partial event' within the same window, counter is therefore 0.

Counter is always 0 or 1.

Note that there are other events other than 'full' and 'partial' in the column event which we should ignore.

timestamp                id  event    
2021-10-26 10:00:00.000  1   full     
2021-10-26 10:00:01.000  2   partial  
2021-10-26 10:00:03.090  3   full     
2021-10-26 10:00:05.090  2   partial  
2021-10-26 10:00:05.590  4   event_z
2021-10-26 10:00:05.690  4   event_z
2021-10-26 10:00:05.790  4   event_b


timestamp                id  event    counter
2021-10-26 10:00:00.000  1   full     1
2021-10-26 10:00:01.000  2   partial  1
2021-10-26 10:00:03.090  3   full     1
2021-10-26 10:00:05.090  2   partial  0
2021-10-26 10:00:05.090  4   event_z  0
2021-10-26 10:00:05.090  4   event_z  0
2021-10-26 10:00:05.090  4   event_b  0

df['counter'].rolling('1min').agg(lambda ... ?)

I am having hard time to write a complicate formula as lambda function, maybe better to split the problem in two and create intermediate columns ?

thanks in advance !

Upvotes: 0

Views: 204

Answers (1)

jezrael
jezrael

Reputation: 863351

IIUC use:

m = df['event'].isin(['full','partial'])
df.loc[m, 'new'] = pd.factorize(df.loc[m, 'event'])[0]

def f(x):
    a = x.duplicated() & x.notna()
    return ~a.any()
df = df.groupby('id')['new'].rolling('1min').apply(f).fillna(0).reset_index(level=0).sort_index()
print (df)
                         id  new
timestamp                       
2021-10-26 10:00:00.000   1  1.0
2021-10-26 10:00:01.000   2  1.0
2021-10-26 10:00:03.090   3  1.0
2021-10-26 10:00:05.090   2  0.0
2021-10-26 10:00:05.590   4  0.0
2021-10-26 10:00:05.690   4  0.0
2021-10-26 10:00:05.790   4  0.0

Upvotes: 0

Related Questions