Chogg
Chogg

Reputation: 419

Automatically recognising datetime in pandas

I am trying to subtract two timeseries to get a difference in time.

The source data in train.csv looks like

id,vendor_id,pickup_datetime,dropoff_datetime
id2875421,2,2016-03-14 17:24:55,2016-03-14 17:32:30
id2377394,1,2016-06-12 00:43:35,2016-06-12 00:54:38

My code is

train = pd.read_csv('./train.csv')
out = train.dropoff_datetime[0:5] - train.dropoff_datetime[0:5]

I get an error which ends in:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Presumably this is because the datetime columns are being stored as objects or strings rather than as datetime objects. In some installations pandas is able to automatically recognize date columns such as this. How can this be set up? My experience is that pandas unpredictably recognizes the date column in different formats. What am I doing wrong? Thanks in advance.

I am using Python 3.6 and pandas 0.20 in a jupyter notebook.

Upvotes: 2

Views: 2573

Answers (3)

jezrael
jezrael

Reputation: 862681

Automatic parsing datetimes columns without no specifying them is not possible in pandas.

You need define columns for parsing to datetime by positions in parameter parse_dates in read_csv:

train = pd.read_csv('./train.csv', parse_dates=[2,3])

Or by columns names:

train = pd.read_csv('./train.csv', parse_dates=['pickup_datetime','dropoff_datetime'])

print (train.dtypes)
id                          object
vendor_id                    int64
pickup_datetime     datetime64[ns]
dropoff_datetime    datetime64[ns]
dtype: object

out=train.dropoff_datetime[0:5]-train.pickup_datetime[0:5]
print (out)
0   00:07:35
1   00:11:03
dtype: timedelta64[ns]

For new column:

train['difference']=train.dropoff_datetime-train.pickup_datetime
print (train)
          id  vendor_id     pickup_datetime    dropoff_datetime difference
0  id2875421          2 2016-03-14 17:24:55 2016-03-14 17:32:30   00:07:35
1  id2377394          1 2016-06-12 00:43:35 2016-06-12 00:54:38   00:11:03

Upvotes: 1

Chogg
Chogg

Reputation: 419

The answer by @jezrael is good.

Also, the following would work.

train = pd.read_csv('./train.csv')
train['dropoff_datetime'] = pd.to_datetime(train.dropoff_datetime)
train['pickup_datetime'] = pd.to_datetime(train.pickup_datetime)

Upvotes: 0

Alexander
Alexander

Reputation: 109546

train = pd.read_csv('./train.csv')
>>>> train.assign(out=pd.to_datetime(train.dropoff_datetime) - pd.to_datetime(train.pickup_datetime))
          id  vendor      pickup_datetime     dropoff_datetime      out
0  id2875421       1  2016-03-14 17:24:55  2016-03-14 17:32:30 00:07:35
1  id2377394       2  2016-06-12 00:43:35  2016-06-12 00:54:38 00:11:03

Upvotes: 1

Related Questions