Reputation: 8247
I have following dataframe in pandas
start_date start_time end_time
2018-01-01 23:55:00 00:05:00
2018-01-02 00:05:00 00:10:00
2018-01-03 23:59:00 00:05:00
I want to calculate the time difference. But, for 1st and 3rd observation, there is a date change in end_time
.
How can I do it in pandas?
Currently, I am using the logic where end_time
is less than start_time
I am creating one more column called end_date
where it increments the start_date
by 1 and then subtracts the time.
Is there any other way to do it?
Upvotes: 1
Views: 96
Reputation: 863801
Solution working with timedeltas - if difference are days
equal -1
then add one day:
df['start_time'] = pd.to_timedelta(df['start_time'])
df['end_time'] = pd.to_timedelta(df['end_time'])
d = df['end_time'] - df['start_time']
df['diff'] = d.mask(d.dt.days == -1, d + pd.Timedelta(1, unit='d'))
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00
Another solution:
s = df['end_time'] - df['start_time']
df['diff'] = np.where(df['end_time'] < df['start_time'],
s + pd.Timedelta(1, unit='d'),
s)
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00
Upvotes: 2