afroduck
afroduck

Reputation: 173

How to create a Duration column in Python from Start and Stop times?

I am trying to make a column with time duration (hours - first one would be 1.424 hrs for example). My data looks like this.

enter image description here

Clock_Start and Clock_Stop are objects.

enter image description here

I have tried the following:

df['Duration'] = (df_nyc['Clock_Stop'] - df['Clock_Start']).astype('timedelta64[m]').astype(int)

... But I get the following error:

TypeError: unsupported operand type(s) for -: 'datetime.time' and 'datetime.time'

Can anybody tell me what I am doing wrong here? Any help would be greatly appreciated :)

Upvotes: 1

Views: 651

Answers (2)

FObersteiner
FObersteiner

Reputation: 25544

you have something like

from datetime import time
import pandas as pd

df = pd.DataFrame({"Clock_Start": [time(20,52), time(23,19,55), time(12,17,30)],
                   "Clock_Stop": [time(22,17,27), time(23,41,46), time(3,44)]})

df
  Clock_Start Clock_Stop
0    20:52:00   22:17:27
1    23:19:55   23:41:46
2    12:17:30   03:44:00

Note that there is an ambiguity here: row 3, is 03:44:00 on the next day? Or maybe two days later? Or a day before?! That's why you can't just add or subtract time. This is true in general, not just if t0 > t1.

But you can remove the ambiguity by using the timedelta type and specify for instance that if the difference is negative, assume one day apart, else same day. EX:

# convert datetime.time -> pandas.Timedelta
df["Clock_Start"] = pd.to_timedelta(df["Clock_Start"].astype(str))
df["Clock_Stop"] = pd.to_timedelta(df["Clock_Stop"].astype(str))

df["Duration"] = df["Clock_Stop"] - df["Clock_Start"]

# df["Duration"]
# 0     0 days 01:25:27
# 1     0 days 00:21:51
# 2   -1 days +15:26:30
# Name: Duration, dtype: timedelta64[ns]

df.loc[df["Duration"]<pd.Timedelta(0)] += pd.Timedelta(days=1)
### alternatively ###
# df["Duration"] = np.where((df["Clock_Stop"]-df["Clock_Start"]) > pd.Timedelta(0), 
#                           df["Clock_Stop"]-df["Clock_Start"], 
#                           (df["Clock_Stop"]-df["Clock_Start"]) + pd.Timedelta(days=1))

# df["Duration"]
# 0   0 days 01:25:27
# 1   0 days 00:21:51
# 2   0 days 15:26:30
# Name: Duration, dtype: timedelta64[ns]

Now you can get other representations of the duration, e.g. fractional hours like

df["Duration"].dt.total_seconds()/3600
0     1.424167
1     0.364167
2    15.441667
Name: Duration, dtype: float64

Upvotes: 1

Code Different
Code Different

Reputation: 93151

Time alone cannot be added or subtracted. You need to convert to Timedelta. There is a stop time that is before the start time, which I assume is because it goes past midnight. In that case, add 24 hours to the result:

h = (pd.to_timedelta(df["Clock_Stop"]) - pd.to_timedelta(df["Clock_Start"])) / pd.Timedelta(hours=1)
h = h.mask(h < 0, h + 24)

Upvotes: 1

Related Questions