Reputation: 6524
I am confused by the behavior of pandas to_datetime
on non-American dates.
In this trivial example, Pandas correctly infers the months on both the 2nd and 3rd row, but fails on the 1st and 4th.
Apparently it is treating the 2nd and 3rd line as dd/mm/yyyy
dates (because the 13 and 27 clearly can't be months), but treating the remaining dates as being mm/dd/yyyy
.
My expectation would be that to_datetime
would infer from the entire series, and then treat every entry the same.
import pandas as pd
results = pd.DataFrame()
european_dates = pd.Series(['05/04/2007', # <-- April 5th, 2007
'13/04/2006', # <-- April 13th, 2006
'27/12/2014', # <-- December 27th, 2014
'02/07/2010']) # <-- July 2nd, 2010
# note: the same happens with infer_datetime_format=False
inferred_dates = pd.to_datetime(european_dates,
infer_datetime_format=True)
results['day'] = inferred_dates.dt.day
results['month'] = inferred_dates.dt.month
results['year'] = inferred_dates.dt.year
results
Note: I'm aware that to_datetime
has a dayfirst
argument, and a format
argument, my question is mostly about why the infer_datetime_format
fails in such a trivial case.
Upvotes: 3
Views: 1224
Reputation: 323306
Using dayfirst
in to_datetime
european_dates = pd.Series(['05/04/2007', # <-- April 5th, 2007
'13/04/2006', # <-- April 13th, 2006
'27/12/2014', # <-- December 27th, 2014
'02/07/2010']) # <-- July 2nd, 2010
inferred_dates = pd.to_datetime(european_dates,dayfirst =True)
results = pd.DataFrame()
results['day'] = inferred_dates.dt.day
results['month'] = inferred_dates.dt.month
results['year'] = inferred_dates.dt.year
results
Out[109]:
day month year
0 5 4 2007
1 13 4 2006
2 27 12 2014
3 2 7 2010
Upvotes: 1