Reputation: 1391
I'm using the pd.to_datetime(df['Date'], format="%d/%m/%Y")
method to try and convert the column to a datetime column.
Now, the dates are of the form day/month/year, but in varying different formats, for example:
5/8/2005
13/08/05
4/2/2006
I get an error: time data '13/08/05' does not match format '%d/%m/%Y' (match)
when I try and use pd.to_datetime()
, and when I remove the formatting, pandas incorrectly transforms the dates.
How do I deal with this?
Upvotes: 1
Views: 65
Reputation: 3158
You cannot directly handle that. You can see details on (I did not know about datetime
formatting and notice that there is a %y
and %Y
directive, meaning 2-digit and 4-digit year, respectively.dayfirst
, but I think the apply
use case is still helpful for others.)
I would suggest creating a small function and then using that function on a column apply. Something like this:
>>> def convertYear(val):
... (day, month, year) = val.split('/')
... if len(year) == 2:
... if int(year) > 40:
... year = '19' + year
... else:
... year = '20' + year
... newvals = [day, month, year]
... return '/'.join(newvals)
... return val
Then you can call this function on the column of interest; something like:
df.Date.apply(convertYear, axis='index') # don't use axis=0, not readability-friendly
Now you can finally finish with:
pd.to_datetime(df['Date'], format="%d/%m/%Y"
Be warned: even though my function tries to be clever, it is assuming that you have European dates (DD/MM/YYYY), I assume this because of 13/08/05
. In the US, it is (MM/DD/YYYY).
Because of this annoyance, proper datetimes should be internationalized, which is really just the East Asian style. (Maybe all of Asia??)
Upvotes: 0
Reputation: 641
I was stuck in a similar problem once, where there were certain cases with a combination of "Month First" and "Day First". I tackled the problem using the following code.
df=pd.DataFrame({"A":["5/8/2005","13/08/05","4/2/2006"]})
df.loc[:,"IsMonth"]=df.loc[:,"A"].apply(lambda x: int(x.split("/")[0])>12)
df_1=df[df.loc[:,"IsMonth"]==True]
df_2=df[~df.index.isin(df_1.index)]
df_1["A"]=pd.to_datetime(df_1["A"])
df_2["A"]=pd.to_datetime(df_2["A"])
df=pd.concat([df_1,df_2])
df.drop("IsMonth", inplace=True, axis=1)
Please note that the default behavior of pd.to_datetime()
is to use dayfirst=False
Upvotes: 0
Reputation: 15872
Use dayfirst
keyword in pd.to_datetime
:
>>> pd.to_datetime(df['Date'], dayfirst=True)
0 2005-08-05
1 2005-08-13
2 2006-02-04
dtype: datetime64[ns]
Upvotes: 2