Ison
Ison

Reputation: 403

How can I handle wrong year format

Being new to python and pandas, I faced next problem. In my dataframe i have column with dates (yyyy-mm-ddThh-mm-sec), where most part of the years are ok (looks like 2008), and a part, where year is written like 0008. Due to this I have problem with formatting column using pd.to_datetime.

My thought was to convert it first into 2-digit year (using pd.to_datetime(df['date']).dt.strftime('%y %b, %d %H:%M:%S.%f +%Z')), but I got an error Out of bounds nanosecond timestamp: 08-10-02 14:41:00.

Are there any other options to convert 0008 to 2008 in dataframe?

Thanks for the help in advance

Upvotes: 2

Views: 256

Answers (1)

It_is_Chris
It_is_Chris

Reputation: 14113

If the format for the bad data is always the same (as in the bad years are always 4 characters) then you can use str:

df = pd.DataFrame({'date':['2008-01-01', '0008-01-02']})
df['date'] = pd.to_datetime(df['date'].str[2:], yearfirst=True)

    date
0   2008-01-01
1   2008-01-02

Upvotes: 5

Related Questions