Reputation: 171
I have a dataframe called "df" that consists of Date, ID, and Activity that is sorted both on "date" and "id" in order to group IDs together then arrange them chronologically earliest to latest.
I would like to subtract the current "Date" column from the previous "Date" in order to find the duration between the two times.
Date ID Activity Duration
4/12/2018 7:58 1111 1
4/12/2018 8:40 1111 0
4/12/2018 8:42 1111 1
4/12/2018 9:26 1111 0
My script thus far:
for row in df.itertuples():
callDate = datetime.strptime(row.Date, "%m/%d/%y %I:%M %p")
previousRecord = df['Date'].shift(-1).strptime(row.Date, "%m/%d/%y %I:%M %p")
duration = callDate - previousRecord
Line 3 errors out because I can't access the previous row AND convert it to a datetime object so duration can process the timedelta accordingly. If I remove the strptime call on line 3 then the script errors out because it's receiving a datetime and string.
Is there a more elegant way to do this? Also, bonus points if I can leave off the Month/Day/Year when I put the value in the Duration column.
Thank you so much! I greatly appreciate the help.
Upvotes: 1
Views: 1719
Reputation: 294218
Use groupby
with diff
df.assign(Duration=df.groupby('ID').Date.diff().fillna(0))
Date ID Activity Duration
0 2018-04-12 07:58:00 1111 1 00:00:00
1 2018-04-12 08:40:00 1111 0 00:42:00
2 2018-04-12 08:42:00 1111 1 00:02:00
3 2018-04-12 09:26:00 1111 0 00:44:00
Upvotes: 3