dodoelhos
dodoelhos

Reputation: 217

Python - Using Pandas to calculate time between dates for each unique identifier

I have been stuck with this for a while and i would appreciate some help. Here is a simplified version of a pandas dataframe i am working with:

ID  Date
1   16/07/2020 14:11:12
1   17/07/2020 15:12:11
1   16/07/2020 15:32:44
1   16/07/2020 17:11:24
2   19/07/2020 06:11:12
2   19/07/2020 12:12:11
2   20/07/2020 14:32:44
2   20/07/2020 02:11:24

I would like to calculate the time by using the minimum value for each ID and subtracting it for rows that hold the same ID. However, i want the format to be in hh:mm:ss where if the difference is more than 24 hours it would be 37:44:22 for example.

Here is how i would the results should look like.

ID  Date                    time
1   16/07/2020 14:11:12     00:00:00
1   17/07/2020 18:32:23     28:21:11
1   16/07/2020 18:32:44     04:21:32
1   16/07/2020 17:11:24     00:00:12
2   19/07/2020 06:11:12     00:00:00
2   19/07/2020 12:12:12     06:01:00
2   20/07/2020 14:32:44     26:20:32

I have found previous questions such as this Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes but most questions subtract dates from the same row. However, in my case i have to find the id with the earliest date, and use it to find the time difference for rows that carry the same ID.

Upvotes: 1

Views: 917

Answers (1)

jezrael
jezrael

Reputation: 863166

First convert column to datetimes by to_datetime, then subtract minimal per group by GroupBy.transform and last use custom function for custom format of timedeltas:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{:02d}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

df['time1'] = df['Date'].sub(df.groupby('ID')['Date'].transform('min')).apply(f)
print (df)
   ID                Date      time     time1
0   1 2020-07-16 14:11:12  00:00:00  00:00:00
1   1 2020-07-17 18:32:23  28:21:11  28:21:11
2   1 2020-07-16 18:32:44  04:21:32  04:21:32
3   1 2020-07-16 17:11:24  00:00:12  03:00:12
4   2 2020-07-19 06:11:12  00:00:00  00:00:00
5   2 2020-07-19 12:12:12  06:01:00  06:01:00
6   2 2020-07-20 14:32:44  26:20:32  32:21:32

Upvotes: 1

Related Questions