gizq
gizq

Reputation: 197

Pandas Dataframe check if id appears more than 1 in time interval

Good afternoon, lets say I have this dataframe.

id  event_date  event_type
1   8/5/2018    1
2   8/5/2018    1
3   8/7/2018    1
1   8/10/2018   1
3   8/14/2018   1
2   8/16/2018   1
1   8/19/2018   1
3   8/28/2018   1

I want to check if an id appears more than once in a 10 days timeframe.

I am able to get how many times the id appears with

counts = df['id'].value_counts()

df[df['id'].isin(counts.index[counts > 1])]

But im stuck with the part of only when into timeframe.

In this scenario I would need only to get

id times
1  2
3  1

Thanks in advance

Upvotes: 2

Views: 752

Answers (1)

Zero
Zero

Reputation: 76917

Use could groupby on id and get event_date difference for less than 10 days, then sum the instances.

In [442]: (df.groupby('id')['event_date'].diff().dt.days.lt(10)
             .groupby(df['id']).sum()
             .reset_index(name='times'))
Out[442]:
   id  times
0   1    2.0
1   2    0.0
2   3    1.0

Details

In [445]: df.groupby('id')['event_date'].diff()
Out[445]:
0       NaT
1       NaT
2       NaT
3    5 days
4    7 days
5   11 days
6    9 days
7   14 days
Name: event_date, dtype: timedelta64[ns]

In [446]: df.groupby('id')['event_date'].diff().dt.days
Out[446]:
0     NaN
1     NaN
2     NaN
3     5.0
4     7.0
5    11.0
6     9.0
7    14.0
Name: event_date, dtype: float64

In [447]: df.groupby('id')['event_date'].diff().dt.days.lt(10)
Out[447]:
0    False
1    False
2    False
3     True
4     True
5    False
6     True
7    False
Name: event_date, dtype: bool

In [448]: df.groupby('id')['event_date'].diff().dt.days.lt(10).groupby(df.id).sum()
Out[448]:
id
1    2.0
2    0.0
3    1.0
Name: event_date, dtype: float64

Upvotes: 2

Related Questions