Reputation: 419
I am trying to subtract two timeseries to get a difference in time.
The source data in train.csv looks like
id,vendor_id,pickup_datetime,dropoff_datetime
id2875421,2,2016-03-14 17:24:55,2016-03-14 17:32:30
id2377394,1,2016-06-12 00:43:35,2016-06-12 00:54:38
My code is
train = pd.read_csv('./train.csv')
out = train.dropoff_datetime[0:5] - train.dropoff_datetime[0:5]
I get an error which ends in:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Presumably this is because the datetime columns are being stored as objects or strings rather than as datetime objects. In some installations pandas is able to automatically recognize date columns such as this. How can this be set up? My experience is that pandas unpredictably recognizes the date column in different formats. What am I doing wrong? Thanks in advance.
I am using Python 3.6 and pandas 0.20 in a jupyter notebook.
Upvotes: 2
Views: 2573
Reputation: 862681
Automatic parsing datetimes columns without no specifying them is not possible in pandas.
You need define columns for parsing to datetime by positions in parameter parse_dates
in read_csv
:
train = pd.read_csv('./train.csv', parse_dates=[2,3])
Or by columns names:
train = pd.read_csv('./train.csv', parse_dates=['pickup_datetime','dropoff_datetime'])
print (train.dtypes)
id object
vendor_id int64
pickup_datetime datetime64[ns]
dropoff_datetime datetime64[ns]
dtype: object
out=train.dropoff_datetime[0:5]-train.pickup_datetime[0:5]
print (out)
0 00:07:35
1 00:11:03
dtype: timedelta64[ns]
For new column:
train['difference']=train.dropoff_datetime-train.pickup_datetime
print (train)
id vendor_id pickup_datetime dropoff_datetime difference
0 id2875421 2 2016-03-14 17:24:55 2016-03-14 17:32:30 00:07:35
1 id2377394 1 2016-06-12 00:43:35 2016-06-12 00:54:38 00:11:03
Upvotes: 1
Reputation: 419
The answer by @jezrael is good.
Also, the following would work.
train = pd.read_csv('./train.csv')
train['dropoff_datetime'] = pd.to_datetime(train.dropoff_datetime)
train['pickup_datetime'] = pd.to_datetime(train.pickup_datetime)
Upvotes: 0
Reputation: 109546
train = pd.read_csv('./train.csv')
>>>> train.assign(out=pd.to_datetime(train.dropoff_datetime) - pd.to_datetime(train.pickup_datetime))
id vendor pickup_datetime dropoff_datetime out
0 id2875421 1 2016-03-14 17:24:55 2016-03-14 17:32:30 00:07:35
1 id2377394 2 2016-06-12 00:43:35 2016-06-12 00:54:38 00:11:03
Upvotes: 1