jonboy
jonboy

Reputation: 368

Measure different between timestamps using conditions - python

I'm trying to measure the difference between timestamps using certain conditions. Using below, for each unique ID, I'm hoping to subtract the End Time where Item == A and the Start Time where Item == D.

So the timestamps are actually located on separate rows.

At the moment my process is returning an error. I'm also hoping to drop the .shift() for something more robust as each unique ID will have different combinations. For ex, A,B,C,D - A,B,D - A,D etc.

df = pd.DataFrame({'ID': [10,10,10,20,20,30],
               'Start Time': ['2019-08-02 09:00:00','2019-08-03 10:50:00','2019-08-05 16:00:00','2019-08-04 08:00:00','2019-08-04 15:30:00','2019-08-06 11:00:00'], 
               'End Time': ['2019-08-04 15:00:00','2019-08-04 16:00:00','2019-08-05 16:00:00','2019-08-04 14:00:00','2019-08-05 20:30:00','2019-08-07 10:00:00'], 
               'Item': ['A','B','D','A','D','A'],
                })

df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])

df['diff'] = (df.groupby('ID')
                .apply(lambda x: x['End Time'].shift(1) - x['Start Time'].shift(1))
                .reset_index(drop=True))

Intended Output:

   ID          Start Time            End Time Item            diff
0  10 2019-08-02 09:00:00 2019-08-04 15:00:00    A             NaT
1  10 2019-08-03 10:50:00 2019-08-04 16:00:00    B             NaT
2  10 2019-08-05 16:00:00 2019-08-05 16:00:00    D 1 days 01:00:00
3  20 2019-08-04 08:00:00 2019-08-04 14:00:00    A             NaT
4  20 2019-08-04 15:30:00 2019-08-05 20:30:00    D 0 days 01:30:00
5  30 2019-08-06 11:00:00 2019-08-07 10:00:00    A             NaT

Upvotes: 1

Views: 42

Answers (2)

jezrael
jezrael

Reputation: 862761

IIUC use:

df1 = df.pivot('ID','Item')
print (df1)
              Start Time                                          \
Item                   A                   B                   D   
ID                                                                 
10   2019-08-02 09:00:00 2019-08-03 10:50:00 2019-08-04 15:00:00   
20   2019-08-04 08:00:00                 NaT 2019-08-05 10:30:00   
30   2019-08-06 11:00:00                 NaT                 NaT   

                End Time                                          
Item                   A                   B                   D  
ID                                                                
10   2019-08-02 09:30:00 2019-08-03 11:00:00 2019-08-05 16:00:00  
20   2019-08-04 14:00:00                 NaT 2019-08-05 20:30:00  
30   2019-08-07 10:00:00                 NaT                 NaT  

a = df1[('Start Time','D')].sub(df1[('End Time','A')])
print (a)
ID
10   2 days 05:30:00
20   0 days 20:30:00
30               NaT
dtype: timedelta64[ns]

Upvotes: 1

mozway
mozway

Reputation: 260835

df2 = df.set_index('ID')
df2.query('Item == "D"')['Start Time']-df2.query('Item == "A"')['End Time']

output:

ID
10   2 days 05:30:00
20   0 days 20:30:00
30               NaT
dtype: timedelta64[ns]

older answer

The issue is your fillna, you can't have strings in a timedelta column:

df['diff'] = (df.groupby('ID')
                .apply(lambda x: x['End Time'].shift(1) - x['Start Time'].shift(1))
                #.fillna('-')  # the issue is here
                .reset_index(drop=True))

output:

   ID          Start Time            End Time Item            diff
0  10 2019-08-02 09:00:00 2019-08-02 09:30:00    A             NaT
1  10 2019-08-03 10:50:00 2019-08-03 11:00:00    B 0 days 00:30:00
2  10 2019-08-04 15:00:00 2019-08-05 16:00:00    C 0 days 00:10:00
3  20 2019-08-04 08:00:00 2019-08-04 14:00:00    B             NaT
4  20 2019-08-05 10:30:00 2019-08-05 20:30:00    C 0 days 06:00:00
5  30 2019-08-06 11:00:00 2019-08-07 10:00:00    A             NaT

Upvotes: 1

Related Questions