Reputation: 11
I am using the pd.to_datetime()
method to convert a column of strings to datetime objects.
import pandas as pd import numpy as np
main = pd.read_csv('correctedstats.csv', nrows = 50) main['Date'] =
pd.to_datetime(main['Date'], format = '%d/%m/%y')
Which is giving me the following error-
Traceback (most recent call last): File "/Users/pranavmarla/jama/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 448, in _convert_listlike_datetimes values, tz = conversion.datetime_to_datetime64(arg) File "pandas/_libs/tslibs/conversion.pyx", line 200, in pandas._libs.tslibs.conversion.datetime_to_datetime64 TypeError: Unrecognized value type:
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "mergeDates.py", line 8, in main['Date'] = pd.to_datetime(main['Date'].astype(str), format = '%d/%m/%y') File "/Users/pranavmarla/jama/lib/python3.6/site-packages/pandas/util/_decorators.py", line 208, in wrapper return func(*args, **kwargs) File "/Users/pranavmarla/jama/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 778, in to_datetime values = convert_listlike(arg._values, True, format) File "/Users/pranavmarla/jama/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 451, in _convert_listlike_datetimes raise e File "/Users/pranavmarla/jama/lib/python3.6/site-packages/pandas/core/tools/datetimes.py", line 416, in _convert_listlike_datetimes arg, format, exact=exact, errors=errors File "pandas/_libs/tslibs/strptime.pyx", line 148, in pandas._libs.tslibs.strptime.array_strptime ValueError: unconverted data remains: 00
The date is in the format of '03/01/00'
The code is running fine if us the date's text into the function in the following way
pd.to_datetime('03/01/00', format = '%d/%m/%y')
What am I doing wrong here?
Edit - companystats.csv -
0 1.0 34.0 Asian Paints 15.90 INE021A01026 03/01/00
1 2.0 34.0 Asian Paints 16.06 INE021A01026 04/01/00
2 3.0 34.0 Asian Paints 15.96 INE021A01026 05/01/00
3 4.0 34.0 Asian Paints 15.73 INE021A01026 06/01/00
4 5.0 34.0 Asian Paints 16.07 INE021A01026 07/01/00
5 6.0 34.0 Asian Paints 17.29 INE021A01026 10/01/00
6 7.0 34.0 Asian Paints 17.30 INE021A01026 11/01/00
7 8.0 34.0 Asian Paints 17.40 INE021A01026 12/01/00
8 9.0 34.0 Asian Paints 17.45 INE021A01026 13/01/2000
9 10.0 34.0 Asian Paints 17.08 INE021A01026 14/01/2000
10 11.0 34.0 Asian Paints 17.14 INE021A01026 17/01/2000
11 12.0 34.0 Asian Paints 17.25 INE021A01026 18/01/2000
12 13.0 34.0 Asian Paints 17.47 INE021A01026 19/01/2000
13 14.0 34.0 Asian Paints 17.52 INE021A01026 20/01/2000
14 15.0 34.0 Asian Paints 17.74 INE021A01026 21/01/2000
15 16.0 34.0 Asian Paints 19.10 INE021A01026 24/01/2000
16 17.0 34.0 Asian Paints 20.61 INE021A01026 25/01/2000
17 18.0 34.0 Asian Paints 19.71 INE021A01026 27/01/2000
18 19.0 34.0 Asian Paints 20.03 INE021A01026 28/01/2000
19 20.0 34.0 Asian Paints 19.19 INE021A01026 31/01/2000
20 21.0 34.0 Asian Paints 19.18 INE021A01026 01/02/00
21 22.0 34.0 Asian Paints 19.17 INE021A01026 02/02/00
22 23.0 34.0 Asian Paints 19.00 INE021A01026 03/02/00
23 24.0 34.0 Asian Paints 19.03 INE021A01026 04/02/00
EDIT - Solved! My date column had two types of dates;
21/01/2000
and
02/02/00
I used the errors attribute of the pd.to_datetime() method to create two seperate columns which i then merged to give me my desired column
pd.to_datetime(main,format = '%d/%m/%y', errors = 'coerce')
and
pd.to_datetime(main,format = '%d/%m/%Y', errors = 'coerce')
A more graceful solution would be welcome
Upvotes: 1
Views: 6909
Reputation: 98
You can try passing format='mixed'
to the to_datetime
method like so:
df['Date'] = pd.to_datetime(df['Date'], format='mixed').dt.date
If format is set to 'mixed'
, Pandas will infer each date format individually based. Fair warning, though. They recommend using it with 'dayfirst'
to be safe.
Upvotes: 0
Reputation: 21719
Here's a nice way to see where the function is failing:
df = pd.DataFrame({'date': ['03/01/00','13/01/2000','00/01/00']})
def date_format(f):
try:
return pd.to_datetime(f)
except Exception as e:
return e.args
print(df['date'].apply(date_format))
0 2000-03-01 00:00:00
1 2000-01-13 00:00:00
2 (month must be in 1..12,)
Upvotes: 1
Reputation: 2868
try to pass infer_datetime_format as true, it might be that in your data, date it is not in the format that you are passing
pd.to_datetime(df['Date'], infer_datetime_format=True)
OR pass errors = 'ignore'
pd.to_datetime(df['Date'], format = '%d/%m/%y',errors='ignore')
Upvotes: 0