Reputation: 55
I am trying to convert a column in my dataframe to dates, which are meant to be birthdays. The data was manually captured over a period of years with different formats. I cant get Pandas to format the whole column correctly.
formats include:
YYYYMMDD
DDMMYYYY
DD/MM/YYYY
DD-MMM-YYYY (eg JAN)
I have tried
dates['BIRTH-DATE(MAIN)'] = pd.to_datetime(dates['BIRTH-DATE(MAIN)'])
but i get the error
ValueError: year 19670314 is out of range
Not sure how I can get it to include multiple date formats?
Upvotes: 4
Views: 466
Reputation: 18647
You could create your own function to handle this. For example, something like:
df = pd.DataFrame({'date': {0: '20180101', 1: '01022018', 2: '01/02/2018', 3: '01-JAN-2018'}})
def fix_date(series, patterns=['%Y%m%d', '%d%m%Y', '%d/%m/%Y', '%d-%b-%Y']):
datetimes = []
for pat in patterns:
datetimes.append(pd.to_datetime(series, format=pat, errors='coerce'))
return pd.concat(datetimes, axis=1).ffill(axis=1).iloc[:, -1]
df['fixed_dates'] = fix_date(df['date'])
[out]
print(df)
date fixed_dates
0 20180101 2018-01-01
1 01022018 2018-02-01
2 01/02/2018 2018-02-01
3 01-JAN-2018 2018-01-01
Upvotes: 4
Reputation: 1286
In my eyes pandas
is really good in converting dates but it is nearly impossible to guess always the right format automatically. Use pd.to_datetime
with the option errors='coerce'
and check the dates which were not converted by hand.
Upvotes: 2