Reputation: 324
I have a dataframe of products for which I have their inputs and outputs. And I want to do a sliding window of 30 minutes to see if a product got in and out in those 30 minutes. I was thinking in using groupby and then filter those groups to only get the ones that they have the in(1) and out(0).
But maybe there is a better way to do it that I didn't think of.
Example:
code timestamp in_out
0 104 2018-12-18 16:15:00 1
1 105 2018-12-18 16:15:00 1
2 105 2018-12-18 16:35:00 0
3 107 2018-12-19 16:15:00 1
4 104 2019-01-13 10:00:00 0
5 502 2018-12-18 16:15:00 0
Edit:
The code I talked about groupby is:
time1 = df.iloc[0]['timestamp']
time1end = time1 + timedelta(minutes=30)
grp1 = df[(df['timestamp'] >= time1) & (df['timestamp'] <= time1end)]
grp1.groupby(by='subject_code').filter(lambda x: len(x) > 1)
Upvotes: 0
Views: 297
Reputation: 508
I propose you do it like this:
make sure that the timestamp is the format of datetime
df['timestamp'] = pd.to_datetime(df['timestamp'], infer_datetime_format=True)
Then take the codes that happened in and out (if you are sure every code was repeated exactly two times you can skip this step)
df2 = df.groupby('code').filter(lambda x: x['in_out'].mean()==0.5)
I filtered with mean==0.5 meaning once in once out, you might replace it with something smarter Now, let's solve the problem
df3 = df2.groupby('code')['timestamp'].diff().dropna()
What this piece of code does, for each code it calculates the time difference then you can take the ones less than 30 minutes.
Upvotes: 1