Reputation: 20362
I have a dataframe with this:
VendorID lpep_pickup_datetime lpep_dropoff_datetime store_and_fwd_flag \
0 2 2017-03-01 00:30:18 2017-03-01 00:30:47 N
1 2 2017-03-01 00:11:58 2017-03-01 01:05:54 N
2 2 2017-03-01 00:54:44 2017-03-01 01:03:42 N
3 2 2017-03-01 00:00:07 2017-03-01 00:04:39 N
4 2 2017-03-01 00:17:05 2017-03-01 00:36:44 N
I tried this:
data['trip_duration'] = data['lpep_dropoff_datetime'] - data['lpep_pickup_datetime']
data['trip_duration'] = data['trip_duration']/np.timedelta64(1,'m')
I also tried this:
data['trip_duration'] = data['lpep_dropoff_datetime'].values - data['lpep_pickup_datetime'].values
data['trip_duration'] = data['trip_duration']/np.timedelta64(1,'m')
In both cases, I get this error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
I'm trying to create a new column named 'trip_duration' which shows the difference in times. Finally, if the start and end dates were different, like before and after midnight, I'd like the code to handle that seamlessly.
Upvotes: 0
Views: 34
Reputation: 20611
It's not clear how the '2017-03-01 00:30:18'
ascii timestamp value got into your dataframe.
If you populated it with pd.read_csv()
and there weren't any extraneous (non-date) values, then pandas should have parsed the dates for you.
You can use this to post-process:
import datetime as dt
def str_to_stamp(s):
return dt.datetime.strptime(s, '%Y-%m-%d %H:%M:%S')
data['lpep_dropoff_datetime'] = data['lpep_dropoff_datetime'].apply(str_to_stamp)
data['lpep_pickup_datetime'] = data['lpep_pickup_datetime'].apply(str_to_stamp)
String subtraction doesn't work very well, but subtracting timestamps will give you time deltas.
Upvotes: 1