Thelurker Lurker
Thelurker Lurker

Reputation: 55

Pandas date formatting with different multiple date formats problems

I am trying to convert a column in my dataframe to dates, which are meant to be birthdays. The data was manually captured over a period of years with different formats. I cant get Pandas to format the whole column correctly.

formats include:

YYYYMMDD
DDMMYYYY
DD/MM/YYYY
DD-MMM-YYYY (eg JAN)

I have tried

dates['BIRTH-DATE(MAIN)'] = pd.to_datetime(dates['BIRTH-DATE(MAIN)'])

but i get the error

ValueError: year 19670314 is out of range

Not sure how I can get it to include multiple date formats?

Upvotes: 4

Views: 466

Answers (2)

Chris Adams
Chris Adams

Reputation: 18647

You could create your own function to handle this. For example, something like:

df = pd.DataFrame({'date': {0: '20180101', 1: '01022018', 2: '01/02/2018', 3: '01-JAN-2018'}})

def fix_date(series, patterns=['%Y%m%d', '%d%m%Y', '%d/%m/%Y', '%d-%b-%Y']):
    datetimes = []
    for pat in patterns:
        datetimes.append(pd.to_datetime(series, format=pat, errors='coerce'))
    return pd.concat(datetimes, axis=1).ffill(axis=1).iloc[:, -1]

df['fixed_dates'] = fix_date(df['date'])

[out]

print(df)    

          date fixed_dates
0     20180101  2018-01-01
1     01022018  2018-02-01
2   01/02/2018  2018-02-01
3  01-JAN-2018  2018-01-01

Upvotes: 4

JoergVanAken
JoergVanAken

Reputation: 1286

In my eyes pandas is really good in converting dates but it is nearly impossible to guess always the right format automatically. Use pd.to_datetime with the option errors='coerce' and check the dates which were not converted by hand.

Upvotes: 2

Related Questions