Vim
Vim

Reputation: 1518

pandas to_datetime doesn't work as expected

I have a df whose entries are mostly 'yyyymmdd' date strings but there are also some 'nan' values. Note that here these 'nan' are strings instead of the float nan. Now I want to convert all the date strings in the df to pandas datetime format, while converting the nan values to NaN, NaT etc, whatever can be detected by pd.isnull.

An immediate thought would be using the pd.to_datetime function with the errors argument. As said in the documentation,

errors : {'ignore', 'raise', 'coerce'}, default 'raise'
- If 'raise', then invalid parsing will raise an exception
- If 'coerce', then invalid parsing will be set as NaT
- If 'ignore', then invalid parsing will return the input

So, I tried with my example df

            001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date     20100203  20150605       nan  20090501
delist_date       nan  20170801       nan       nan

where all values are str and even nan are actually 'nan'. I then tried pd.to_datetime(df, errors='coerce'), which throws me:

Traceback (most recent call last):
  File "D:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-27-43c41318d6ab>", line 1, in <module>
    pd.to_datetime(df, errors='coerce')
  File "D:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 512, in to_datetime
    result = _assemble_from_unit_mappings(arg, errors=errors)
  File "D:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 591, in _assemble_from_unit_mappings
    "[{0}] is missing".format(','.join(req)))
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing

However, if I tried with individual cells, it worked out fine:

pd.to_datetime(df.iloc[0, 0])
Out[33]: 
Timestamp('2010-02-03 00:00:00')
pd.to_datetime(df.iloc[1, 0])
Out[34]: 
NaT

I can't figure out what happened. For what it's worth though, all columns have object as dtype:

df.dtypes
Out[35]: 
001002.XY    object
600123.AB    object
123456.YZ    object
555555.GO    object
dtype: object

But this doesn't seem to be the culprit: I tested with individual columns too, and they worked out fine:

pd.to_datetime(df.iloc[:, 0])
Out[36]: 
ipo_date      2010-02-03
delist_date          NaT
Name: 001002.XY, dtype: datetime64[ns]

Could someone help or explain? Thanks!

Upvotes: 1

Views: 4816

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

It should be done slightly differently - applying pd.to_datetime to all columns:

In [6]: df.apply(pd.to_datetime, errors='coerce')
Out[6]:
             001002.XY  600123.AB 123456.YZ  555555.GO
ipo_date    2010-02-03 2015-06-05       NaT 2009-05-01
delist_date        NaT 2017-08-01       NaT        NaT

If you pass a DataFrame to pd.to_datetime() - it expects columns like (year, month, day and optionally: hour, minute, second) in order to combine a datetime from separate columns.

From docs:

Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [year, month, day, minute, second, ms, us, ns]) or plurals of the same

Upvotes: 4

Related Questions