Reputation: 344
I have a dataframe that looks like the following:
arrival departure
0 23:55:00 23:57:00
1 23:57:00 23:59:00
2 23:59:00 24:01:00
3 24:01:00 24:03:00
4 24:03:00 24:05:00
I am working with data that cover a whole day and part of the day after. Data are (most of the time) in the HH:MM:SS
format. However some time values are higher than 23:59:59 and go up to 27:00:00.
I would like to get the time difference between departure
and arrival
columns.
I tried using datetime
to do that but I guess something went wrong:
FMT = '%H:%M:%S'
delta = datetime.strptime(df['departure'], FMT) - datetime.strptime(df['arrival'], FMT)
Which raises the following error:
ValueError: time data '24:01:00' does not match format '%H:%M:%S'
Is there a way to get the time difference between these two columns even though their format do not always match the HH:MM:SS
format?
Upvotes: 1
Views: 213
Reputation: 6822
You could use timedelta from datetime
import datetime
delta1 = datetime.timedelta(hours=23, minutes=59, seconds=0)
delta2 = datetime.timedelta(hours=24, minutes=01, seconds=0)
timedelta = delta2 - delta1
>>> timedelta # or timedelta.to_seconds()
datetime.timedelta(seconds=120)
Give you the delta in seconds. Full example:
import datetime
arrival = "24:01:00"
departure = "24:03:00"
def get_time_from_string(t):
return dict(
zip(["hours", "minutes", "seconds"], list(map(lambda x: int(x), t.split(":"))),)
)
delta1 = datetime.timedelta(**get_time_from_string(arrival))
delta2 = datetime.timedelta(**get_time_from_string(departure))
delta = delta2 - delta1
print(delta.total_seconds())
Upvotes: 2