Reputation: 51
I have the first DataFrame and I would like to get the second one with the new column counter.
Logic: using a rolling window (example df.rolling('1min').agg(lambda ...)), for each unique id in the window, the column "counter" is 1 if "full" or "partial" is found at least one time. Once it is found for the first time, we write 1 and we don't need to check further in the window.
For example, below, the id '2' had two times the 'partial event' within the same window, counter is therefore 0.
Counter is always 0 or 1.
Note that there are other events other than 'full' and 'partial' in the column event which we should ignore.
timestamp id event
2021-10-26 10:00:00.000 1 full
2021-10-26 10:00:01.000 2 partial
2021-10-26 10:00:03.090 3 full
2021-10-26 10:00:05.090 2 partial
2021-10-26 10:00:05.590 4 event_z
2021-10-26 10:00:05.690 4 event_z
2021-10-26 10:00:05.790 4 event_b
timestamp id event counter
2021-10-26 10:00:00.000 1 full 1
2021-10-26 10:00:01.000 2 partial 1
2021-10-26 10:00:03.090 3 full 1
2021-10-26 10:00:05.090 2 partial 0
2021-10-26 10:00:05.090 4 event_z 0
2021-10-26 10:00:05.090 4 event_z 0
2021-10-26 10:00:05.090 4 event_b 0
df['counter'].rolling('1min').agg(lambda ... ?)
I am having hard time to write a complicate formula as lambda function, maybe better to split the problem in two and create intermediate columns ?
thanks in advance !
Upvotes: 0
Views: 204
Reputation: 863351
IIUC use:
m = df['event'].isin(['full','partial'])
df.loc[m, 'new'] = pd.factorize(df.loc[m, 'event'])[0]
def f(x):
a = x.duplicated() & x.notna()
return ~a.any()
df = df.groupby('id')['new'].rolling('1min').apply(f).fillna(0).reset_index(level=0).sort_index()
print (df)
id new
timestamp
2021-10-26 10:00:00.000 1 1.0
2021-10-26 10:00:01.000 2 1.0
2021-10-26 10:00:03.090 3 1.0
2021-10-26 10:00:05.090 2 0.0
2021-10-26 10:00:05.590 4 0.0
2021-10-26 10:00:05.690 4 0.0
2021-10-26 10:00:05.790 4 0.0
Upvotes: 0