ASH
ASH

Reputation: 20362

How can I find the difference between two columns with times?

I have a dataframe with this:

VendorID lpep_pickup_datetime lpep_dropoff_datetime store_and_fwd_flag  \
0         2  2017-03-01 00:30:18   2017-03-01 00:30:47                  N   
1         2  2017-03-01 00:11:58   2017-03-01 01:05:54                  N   
2         2  2017-03-01 00:54:44   2017-03-01 01:03:42                  N   
3         2  2017-03-01 00:00:07   2017-03-01 00:04:39                  N   
4         2  2017-03-01 00:17:05   2017-03-01 00:36:44                  N

I tried this:

data['trip_duration'] = data['lpep_dropoff_datetime'] - data['lpep_pickup_datetime']
data['trip_duration'] = data['trip_duration']/np.timedelta64(1,'m')

I also tried this:

data['trip_duration'] = data['lpep_dropoff_datetime'].values - data['lpep_pickup_datetime'].values
data['trip_duration'] = data['trip_duration']/np.timedelta64(1,'m')

In both cases, I get this error:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

I'm trying to create a new column named 'trip_duration' which shows the difference in times. Finally, if the start and end dates were different, like before and after midnight, I'd like the code to handle that seamlessly.

Upvotes: 0

Views: 34

Answers (1)

J_H
J_H

Reputation: 20611

It's not clear how the '2017-03-01 00:30:18' ascii timestamp value got into your dataframe. If you populated it with pd.read_csv() and there weren't any extraneous (non-date) values, then pandas should have parsed the dates for you. You can use this to post-process:

import datetime as dt

def str_to_stamp(s):
    return dt.datetime.strptime(s, '%Y-%m-%d %H:%M:%S')

data['lpep_dropoff_datetime'] = data['lpep_dropoff_datetime'].apply(str_to_stamp)
data['lpep_pickup_datetime'] = data['lpep_pickup_datetime'].apply(str_to_stamp)

String subtraction doesn't work very well, but subtracting timestamps will give you time deltas.

Upvotes: 1

Related Questions