sapo_cosmico
sapo_cosmico

Reputation: 6524

Pandas to_datetime has inconsistent behavior on non-american dates

I am confused by the behavior of pandas to_datetime on non-American dates.

In this trivial example, Pandas correctly infers the months on both the 2nd and 3rd row, but fails on the 1st and 4th.

Apparently it is treating the 2nd and 3rd line as dd/mm/yyyy dates (because the 13 and 27 clearly can't be months), but treating the remaining dates as being mm/dd/yyyy.

My expectation would be that to_datetime would infer from the entire series, and then treat every entry the same.

import pandas as pd 
results = pd.DataFrame()

european_dates = pd.Series(['05/04/2007',   # <-- April 5th, 2007
                            '13/04/2006',   # <-- April 13th, 2006
                            '27/12/2014',   # <-- December 27th, 2014
                            '02/07/2010'])  # <-- July 2nd, 2010 

# note: the same happens with infer_datetime_format=False
inferred_dates = pd.to_datetime(european_dates,
                                infer_datetime_format=True) 

results['day'] = inferred_dates.dt.day
results['month'] = inferred_dates.dt.month
results['year'] = inferred_dates.dt.year

results

enter image description here

Note: I'm aware that to_datetime has a dayfirst argument, and a format argument, my question is mostly about why the infer_datetime_format fails in such a trivial case.

Upvotes: 3

Views: 1224

Answers (1)

BENY
BENY

Reputation: 323306

Using dayfirst in to_datetime

european_dates = pd.Series(['05/04/2007',   # <-- April 5th, 2007
                            '13/04/2006',   # <-- April 13th, 2006
                            '27/12/2014',   # <-- December 27th, 2014
                            '02/07/2010'])  # <-- July 2nd, 2010 
inferred_dates = pd.to_datetime(european_dates,dayfirst =True) 
results = pd.DataFrame()
results['day'] = inferred_dates.dt.day
results['month'] = inferred_dates.dt.month
results['year'] = inferred_dates.dt.year
results
Out[109]: 
   day  month  year
0    5      4  2007
1   13      4  2006
2   27     12  2014
3    2      7  2010

Upvotes: 1

Related Questions