Measure different between timestamps using conditions - python

Question

I'm trying to measure the difference between timestamps using certain conditions. Using below, for each unique ID, I'm hoping to subtract the End Time where Item == A and the Start Time where Item == D.

So the timestamps are actually located on separate rows.

At the moment my process is returning an error. I'm also hoping to drop the .shift() for something more robust as each unique ID will have different combinations. For ex, A,B,C,D - A,B,D - A,D etc.

df = pd.DataFrame({'ID': [10,10,10,20,20,30],
               'Start Time': ['2019-08-02 09:00:00','2019-08-03 10:50:00','2019-08-05 16:00:00','2019-08-04 08:00:00','2019-08-04 15:30:00','2019-08-06 11:00:00'], 
               'End Time': ['2019-08-04 15:00:00','2019-08-04 16:00:00','2019-08-05 16:00:00','2019-08-04 14:00:00','2019-08-05 20:30:00','2019-08-07 10:00:00'], 
               'Item': ['A','B','D','A','D','A'],
                })

df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])

df['diff'] = (df.groupby('ID')
                .apply(lambda x: x['End Time'].shift(1) - x['Start Time'].shift(1))
                .reset_index(drop=True))

Intended Output:

   ID          Start Time            End Time Item            diff
0  10 2019-08-02 09:00:00 2019-08-04 15:00:00    A             NaT
1  10 2019-08-03 10:50:00 2019-08-04 16:00:00    B             NaT
2  10 2019-08-05 16:00:00 2019-08-05 16:00:00    D 1 days 01:00:00
3  20 2019-08-04 08:00:00 2019-08-04 14:00:00    A             NaT
4  20 2019-08-04 15:30:00 2019-08-05 20:30:00    D 0 days 01:30:00
5  30 2019-08-06 11:00:00 2019-08-07 10:00:00    A             NaT

mozway · Accepted Answer

df2 = df.set_index('ID')
df2.query('Item == "D"')['Start Time']-df2.query('Item == "A"')['End Time']

output:

ID
10   2 days 05:30:00
20   0 days 20:30:00
30               NaT
dtype: timedelta64[ns]

older answer

The issue is your fillna, you can't have strings in a timedelta column:

df['diff'] = (df.groupby('ID')
                .apply(lambda x: x['End Time'].shift(1) - x['Start Time'].shift(1))
                #.fillna('-')  # the issue is here
                .reset_index(drop=True))

output:

   ID          Start Time            End Time Item            diff
0  10 2019-08-02 09:00:00 2019-08-02 09:30:00    A             NaT
1  10 2019-08-03 10:50:00 2019-08-03 11:00:00    B 0 days 00:30:00
2  10 2019-08-04 15:00:00 2019-08-05 16:00:00    C 0 days 00:10:00
3  20 2019-08-04 08:00:00 2019-08-04 14:00:00    B             NaT
4  20 2019-08-05 10:30:00 2019-08-05 20:30:00    C 0 days 06:00:00
5  30 2019-08-06 11:00:00 2019-08-07 10:00:00    A             NaT

Measure different between timestamps using conditions - python

Answers (2)

older answer

Related Questions