Reputation: 1518
I have a df
whose entries are mostly 'yyyymmdd'
date strings but there are also some 'nan'
values. Note that here these 'nan'
are strings instead of the float nan. Now I want to convert all the date strings in the df to pandas datetime format, while converting the nan
values to NaN
, NaT
etc, whatever can be detected by pd.isnull
.
An immediate thought would be using the pd.to_datetime
function with the errors
argument. As said in the documentation,
errors : {'ignore', 'raise', 'coerce'}
, default'raise'
- If'raise'
, then invalid parsing will raise an exception
- If'coerce'
, then invalid parsing will be set asNaT
- If'ignore'
, then invalid parsing will return the input
So, I tried with my example df
001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date 20100203 20150605 nan 20090501
delist_date nan 20170801 nan nan
where all values are str
and even nan
are actually 'nan'
. I then tried pd.to_datetime(df, errors='coerce')
, which throws me:
Traceback (most recent call last):
File "D:\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-27-43c41318d6ab>", line 1, in <module>
pd.to_datetime(df, errors='coerce')
File "D:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 512, in to_datetime
result = _assemble_from_unit_mappings(arg, errors=errors)
File "D:\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 591, in _assemble_from_unit_mappings
"[{0}] is missing".format(','.join(req)))
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
However, if I tried with individual cells, it worked out fine:
pd.to_datetime(df.iloc[0, 0])
Out[33]:
Timestamp('2010-02-03 00:00:00')
pd.to_datetime(df.iloc[1, 0])
Out[34]:
NaT
I can't figure out what happened. For what it's worth though, all columns have object
as dtype
:
df.dtypes
Out[35]:
001002.XY object
600123.AB object
123456.YZ object
555555.GO object
dtype: object
But this doesn't seem to be the culprit: I tested with individual columns too, and they worked out fine:
pd.to_datetime(df.iloc[:, 0])
Out[36]:
ipo_date 2010-02-03
delist_date NaT
Name: 001002.XY, dtype: datetime64[ns]
Could someone help or explain? Thanks!
Upvotes: 1
Views: 4816
Reputation: 210982
It should be done slightly differently - applying pd.to_datetime
to all columns:
In [6]: df.apply(pd.to_datetime, errors='coerce')
Out[6]:
001002.XY 600123.AB 123456.YZ 555555.GO
ipo_date 2010-02-03 2015-06-05 NaT 2009-05-01
delist_date NaT 2017-08-01 NaT NaT
If you pass a DataFrame to pd.to_datetime()
- it expects columns like (year, month, day and optionally: hour, minute, second) in order to combine a datetime from separate columns.
From docs:
Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [
year
,month
,day
,minute
,second
,ms
,us
,ns
]) or plurals of the same
Upvotes: 4