Reputation: 368
I'm trying to measure the difference between timestamps using certain conditions. Using below, for each unique ID
, I'm hoping to subtract the End Time
where Item == A
and the Start Time
where Item == D
.
So the timestamps are actually located on separate rows.
At the moment my process is returning an error. I'm also hoping to drop the .shift()
for something more robust as each unique ID
will have different combinations. For ex, A,B,C,D
- A,B,D
- A,D
etc.
df = pd.DataFrame({'ID': [10,10,10,20,20,30],
'Start Time': ['2019-08-02 09:00:00','2019-08-03 10:50:00','2019-08-05 16:00:00','2019-08-04 08:00:00','2019-08-04 15:30:00','2019-08-06 11:00:00'],
'End Time': ['2019-08-04 15:00:00','2019-08-04 16:00:00','2019-08-05 16:00:00','2019-08-04 14:00:00','2019-08-05 20:30:00','2019-08-07 10:00:00'],
'Item': ['A','B','D','A','D','A'],
})
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])
df['diff'] = (df.groupby('ID')
.apply(lambda x: x['End Time'].shift(1) - x['Start Time'].shift(1))
.reset_index(drop=True))
Intended Output:
ID Start Time End Time Item diff
0 10 2019-08-02 09:00:00 2019-08-04 15:00:00 A NaT
1 10 2019-08-03 10:50:00 2019-08-04 16:00:00 B NaT
2 10 2019-08-05 16:00:00 2019-08-05 16:00:00 D 1 days 01:00:00
3 20 2019-08-04 08:00:00 2019-08-04 14:00:00 A NaT
4 20 2019-08-04 15:30:00 2019-08-05 20:30:00 D 0 days 01:30:00
5 30 2019-08-06 11:00:00 2019-08-07 10:00:00 A NaT
Upvotes: 1
Views: 42
Reputation: 862761
IIUC use:
df1 = df.pivot('ID','Item')
print (df1)
Start Time \
Item A B D
ID
10 2019-08-02 09:00:00 2019-08-03 10:50:00 2019-08-04 15:00:00
20 2019-08-04 08:00:00 NaT 2019-08-05 10:30:00
30 2019-08-06 11:00:00 NaT NaT
End Time
Item A B D
ID
10 2019-08-02 09:30:00 2019-08-03 11:00:00 2019-08-05 16:00:00
20 2019-08-04 14:00:00 NaT 2019-08-05 20:30:00
30 2019-08-07 10:00:00 NaT NaT
a = df1[('Start Time','D')].sub(df1[('End Time','A')])
print (a)
ID
10 2 days 05:30:00
20 0 days 20:30:00
30 NaT
dtype: timedelta64[ns]
Upvotes: 1
Reputation: 260835
df2 = df.set_index('ID')
df2.query('Item == "D"')['Start Time']-df2.query('Item == "A"')['End Time']
output:
ID
10 2 days 05:30:00
20 0 days 20:30:00
30 NaT
dtype: timedelta64[ns]
The issue is your fillna
, you can't have strings in a timedelta column:
df['diff'] = (df.groupby('ID')
.apply(lambda x: x['End Time'].shift(1) - x['Start Time'].shift(1))
#.fillna('-') # the issue is here
.reset_index(drop=True))
output:
ID Start Time End Time Item diff
0 10 2019-08-02 09:00:00 2019-08-02 09:30:00 A NaT
1 10 2019-08-03 10:50:00 2019-08-03 11:00:00 B 0 days 00:30:00
2 10 2019-08-04 15:00:00 2019-08-05 16:00:00 C 0 days 00:10:00
3 20 2019-08-04 08:00:00 2019-08-04 14:00:00 B NaT
4 20 2019-08-05 10:30:00 2019-08-05 20:30:00 C 0 days 06:00:00
5 30 2019-08-06 11:00:00 2019-08-07 10:00:00 A NaT
Upvotes: 1