sciorms
sciorms

Reputation: 171

Pandas - Iterate through dataframe and calculate difference between column value and previous column.

I have a dataframe called "df" that consists of Date, ID, and Activity that is sorted both on "date" and "id" in order to group IDs together then arrange them chronologically earliest to latest.

I would like to subtract the current "Date" column from the previous "Date" in order to find the duration between the two times.

Date            ID      Activity    Duration
4/12/2018 7:58  1111    1
4/12/2018 8:40  1111    0
4/12/2018 8:42  1111    1
4/12/2018 9:26  1111    0

My script thus far:

for row in df.itertuples():
    callDate = datetime.strptime(row.Date, "%m/%d/%y  %I:%M %p")
    previousRecord = df['Date'].shift(-1).strptime(row.Date, "%m/%d/%y  %I:%M %p")
    duration = callDate - previousRecord

Line 3 errors out because I can't access the previous row AND convert it to a datetime object so duration can process the timedelta accordingly. If I remove the strptime call on line 3 then the script errors out because it's receiving a datetime and string.

Is there a more elegant way to do this? Also, bonus points if I can leave off the Month/Day/Year when I put the value in the Duration column.

Thank you so much! I greatly appreciate the help.

Upvotes: 1

Views: 1719

Answers (1)

piRSquared
piRSquared

Reputation: 294218

Use groupby with diff

df.assign(Duration=df.groupby('ID').Date.diff().fillna(0))

                 Date    ID  Activity Duration
0 2018-04-12 07:58:00  1111         1 00:00:00
1 2018-04-12 08:40:00  1111         0 00:42:00
2 2018-04-12 08:42:00  1111         1 00:02:00
3 2018-04-12 09:26:00  1111         0 00:44:00

Upvotes: 3

Related Questions