Reputation: 1303
I'm reading Excel files and need to properly handle dates when reading them. Oftentimes columns will be sparsely populated with dates, and the rest will be blanks. If I read this, it is always read as an object
dtype. I'm wanting to correctly change these to datetime64[ns]
, while at the same time not incorrectly converting actual numeric columns.
d = {1: {'DateCol': '02/01/2014', 'NotDateCol': 12457}, 2: {'DateCol': np.nan, 'NotDateCol': 45677}}
df = pd.DataFrame.from_dict(d,orient='index')
In [96]: df.dtypes
Out[96]:
NotDateCol int64
DateCol object
dtype: object
Ok so if you look at this, you can clearly see DateCol is a Date:
NotDateCol DateCol
1 12457 02/01/2014
2 45677 NaN
Now I need some intelligent way to convert date columns to dates, without knowing the header names, or the types beforehand
Trying to just use to_datetime coerces the integers to dates, as so:
In [97]: for col in df.columns:
df[col] = pd.to_datetime(df[col])
....:
In [98]: df
Out[98]:
NotDateCol DateCol
1 1970-01-01 00:00:00.000012457 2014-02-01
2 1970-01-01 00:00:00.000045677 NaT
In [99]: df.dtypes
Out[99]:
NotDateCol datetime64[ns]
DateCol datetime64[ns]
dtype: object
Is there any decently intelligent way to get this to work, that will correctly pick out Datetime-like columns and convert them, while not converting numbers to 1970-01-01?
Upvotes: 0
Views: 1652
Reputation: 5587
You need to coerce it. According to the documentation:
convert_dates : boolean, default True
If True, convert to date where possible. If ‘coerce’, force conversion, with unconvertible values becoming NaT.
The convert_ints
flag is False by default, so in this case:
In [51]:
d = {1: {'DateCol': '02/01/2014', 'NotDateCol': 12457}, 2: {'DateCol': np.nan, 'NotDateCol': 45677}}
df = pd.DataFrame.from_dict(d,orient='index').convert_objects(convert_dates='coerce')
In [52]:
df.dtypes
Out[52]:
NotDateCol int64
DateCol datetime64[ns]
dtype: object
Upvotes: 2