Reputation: 673
I am trying to convert a column of DateTime Strings into a Pandas-comprehensible Datetime Format. Of course I already googled and tried several solutions. Convert Pandas Column to DateTime
This one appeared the most encouraging for me but both recommended ways did not work for my dataset. The details: Dataset name: co,
Column: index-column,
Format: 15.07.2015 24:00 with no more blancs before or after.
My efforts:
co['newdate'] = pd.to_datetime(co.index, format='%d.%m.%Y %H:%M')
The next one I tried after I transformed the Index-col to a "normal" column named "Datum"
co['newdate'] = co['Datum'].apply(lambda x: dt.datetime.strptime(x,'%d.%m.%Y %H:%M'))
The error: time data '15.07.2015 24:00' does not match format '%d.%m.%Y %H:%M'
this error occurs in both solutions. Anyone an idea?
Upvotes: 2
Views: 8770
Reputation: 394409
Your format string is correct but your data is not, 24
is not a valid hour component hence the error:
In [138]:
pd.to_datetime('15.07.2015 24:00', format = '%d.%m.%Y %H:%M')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tseries\tools.py in _convert_listlike(arg, box, format)
329 try:
--> 330 values, tz = tslib.datetime_to_datetime64(arg)
331 return DatetimeIndex._simple_new(values, None, tz=tz)
pandas\tslib.pyx in pandas.tslib.datetime_to_datetime64 (pandas\tslib.c:23823)()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-138-1546fb6950f0> in <module>()
----> 1 pd.to_datetime('15.07.2015 24:00', format = '%d.%m.%Y %H:%M')
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tseries\tools.py in to_datetime(arg, errors, dayfirst, utc, box, format, exact, coerce, unit, infer_datetime_format)
343 return _convert_listlike(arg, box, format)
344
--> 345 return _convert_listlike(np.array([ arg ]), box, format)[0]
346
347 class DateParseError(ValueError):
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tseries\tools.py in _convert_listlike(arg, box, format)
331 return DatetimeIndex._simple_new(values, None, tz=tz)
332 except (ValueError, TypeError):
--> 333 raise e
334
335 if arg is None:
C:\WinPython-64bit-3.4.3.5\python-3.4.3.amd64\lib\site-packages\pandas\tseries\tools.py in _convert_listlike(arg, box, format)
305 try:
306 result = tslib.array_strptime(
--> 307 arg, format, exact=exact, coerce=coerce
308 )
309 except (tslib.OutOfBoundsDatetime):
pandas\tslib.pyx in pandas.tslib.array_strptime (pandas\tslib.c:39900)()
ValueError: time data '15.07.2015 24:00' does not match format '%d.%m.%Y %H:%M' (match)
whilst 23:59
will parse fine
In [139]:
pd.to_datetime('15.07.2015 23:59', format = '%d.%m.%Y %H:%M')
Out[139]:
Timestamp('2015-07-15 23:59:00')
You need to replace 24
with 0
in order for this to parse:
In [140]:
pd.to_datetime('15.07.2015 00:00', format = '%d.%m.%Y %H:%M')
Out[140]:
Timestamp('2015-07-15 00:00:00')
So what you can do is call the vectorised str.replace
to fix these errant hour values:
In [144]:
df = pd.DataFrame({'date':['15.07.2015 24:00']})
print(df)
df['date'] = df['date'].str.replace('24:','00:')
print(df)
pd.to_datetime(df['date'], format = '%d.%m.%Y %H:%M')
date
0 15.07.2015 24:00
date
0 15.07.2015 00:00
Out[144]:
0 2015-07-15
Name: date, dtype: datetime64[ns]
So in your case you can do:
co.index = co.index.str.replace('24:','00:')
and then convert as before
Upvotes: 5