user1610719
user1610719

Reputation: 1303

Coercing Date Columns in Pandas with null values

I'm reading Excel files and need to properly handle dates when reading them. Oftentimes columns will be sparsely populated with dates, and the rest will be blanks. If I read this, it is always read as an object dtype. I'm wanting to correctly change these to datetime64[ns], while at the same time not incorrectly converting actual numeric columns.

d = {1: {'DateCol': '02/01/2014', 'NotDateCol': 12457}, 2: {'DateCol': np.nan, 'NotDateCol': 45677}}
df = pd.DataFrame.from_dict(d,orient='index')
In [96]: df.dtypes
Out[96]: 
NotDateCol     int64
DateCol       object
dtype: object

Ok so if you look at this, you can clearly see DateCol is a Date:

  NotDateCol     DateCol
1      12457  02/01/2014
2      45677         NaN

Now I need some intelligent way to convert date columns to dates, without knowing the header names, or the types beforehand

Trying to just use to_datetime coerces the integers to dates, as so:

In [97]: for col in df.columns:
    df[col]  = pd.to_datetime(df[col])
   ....:     

In [98]: df
Out[98]: 
                     NotDateCol    DateCol
1 1970-01-01 00:00:00.000012457 2014-02-01
2 1970-01-01 00:00:00.000045677        NaT

In [99]: df.dtypes
Out[99]: 
NotDateCol    datetime64[ns]
DateCol       datetime64[ns]
dtype: object

Is there any decently intelligent way to get this to work, that will correctly pick out Datetime-like columns and convert them, while not converting numbers to 1970-01-01?

Upvotes: 0

Views: 1652

Answers (1)

rll
rll

Reputation: 5587

You need to coerce it. According to the documentation:

convert_dates : boolean, default True

If True, convert to date where possible. If ‘coerce’, force conversion, with unconvertible values becoming NaT.

The convert_ints flag is False by default, so in this case:

In [51]:
d = {1: {'DateCol': '02/01/2014', 'NotDateCol': 12457}, 2: {'DateCol': np.nan, 'NotDateCol': 45677}}
df = pd.DataFrame.from_dict(d,orient='index').convert_objects(convert_dates='coerce')

In [52]:
df.dtypes

Out[52]:
NotDateCol             int64
DateCol       datetime64[ns]
dtype: object

Upvotes: 2

Related Questions