set92
set92

Reputation: 324

groupby between dates in timestamp and filtering checking another column

I have a dataframe of products for which I have their inputs and outputs. And I want to do a sliding window of 30 minutes to see if a product got in and out in those 30 minutes. I was thinking in using groupby and then filter those groups to only get the ones that they have the in(1) and out(0).

But maybe there is a better way to do it that I didn't think of.

Example:

    code      timestamp                   in_out
0   104       2018-12-18 16:15:00         1
1   105       2018-12-18 16:15:00         1
2   105       2018-12-18 16:35:00         0
3   107       2018-12-19 16:15:00         1
4   104       2019-01-13 10:00:00         0
5   502       2018-12-18 16:15:00         0

Edit:

The code I talked about groupby is:

time1 = df.iloc[0]['timestamp']
time1end = time1 + timedelta(minutes=30)
grp1 = df[(df['timestamp'] >= time1) & (df['timestamp'] <= time1end)]
grp1.groupby(by='subject_code').filter(lambda x: len(x) > 1)

Upvotes: 0

Views: 297

Answers (1)

Laleh
Laleh

Reputation: 508

I propose you do it like this:

make sure that the timestamp is the format of datetime

df['timestamp'] = pd.to_datetime(df['timestamp'], infer_datetime_format=True)

Then take the codes that happened in and out (if you are sure every code was repeated exactly two times you can skip this step)

df2 = df.groupby('code').filter(lambda x: x['in_out'].mean()==0.5)

I filtered with mean==0.5 meaning once in once out, you might replace it with something smarter Now, let's solve the problem

df3 = df2.groupby('code')['timestamp'].diff().dropna()

What this piece of code does, for each code it calculates the time difference then you can take the ones less than 30 minutes.

Upvotes: 1

Related Questions