Reputation: 197
Good afternoon, lets say I have this dataframe.
id event_date event_type
1 8/5/2018 1
2 8/5/2018 1
3 8/7/2018 1
1 8/10/2018 1
3 8/14/2018 1
2 8/16/2018 1
1 8/19/2018 1
3 8/28/2018 1
I want to check if an id appears more than once in a 10 days timeframe.
I am able to get how many times the id appears with
counts = df['id'].value_counts()
df[df['id'].isin(counts.index[counts > 1])]
But im stuck with the part of only when into timeframe.
In this scenario I would need only to get
id times
1 2
3 1
Thanks in advance
Upvotes: 2
Views: 752
Reputation: 76917
Use could groupby
on id
and get event_date
difference for less than 10 days, then sum
the instances.
In [442]: (df.groupby('id')['event_date'].diff().dt.days.lt(10)
.groupby(df['id']).sum()
.reset_index(name='times'))
Out[442]:
id times
0 1 2.0
1 2 0.0
2 3 1.0
Details
In [445]: df.groupby('id')['event_date'].diff()
Out[445]:
0 NaT
1 NaT
2 NaT
3 5 days
4 7 days
5 11 days
6 9 days
7 14 days
Name: event_date, dtype: timedelta64[ns]
In [446]: df.groupby('id')['event_date'].diff().dt.days
Out[446]:
0 NaN
1 NaN
2 NaN
3 5.0
4 7.0
5 11.0
6 9.0
7 14.0
Name: event_date, dtype: float64
In [447]: df.groupby('id')['event_date'].diff().dt.days.lt(10)
Out[447]:
0 False
1 False
2 False
3 True
4 True
5 False
6 True
7 False
Name: event_date, dtype: bool
In [448]: df.groupby('id')['event_date'].diff().dt.days.lt(10).groupby(df.id).sum()
Out[448]:
id
1 2.0
2 0.0
3 1.0
Name: event_date, dtype: float64
Upvotes: 2