user3471259
user3471259

Reputation: 225

Pandas OutOfBoundsDatetime when cleaning different date formats with dates before 1677

With Pandas I am using this answer to clean up dates with a variety of formats. This works perfectly if I filter out the dates that are prior to 1677. However my dates are historic and many date before 1677 so I get an OutOfBoundsDatetime error.

My data contains dates like:

27 Feb 1928,
1920,
October 2000,
1500,
1625,
Mar 1723

I can see a reference here to using pd.Period but I don't know how to apply it to my case as the dates need to be cleaned first before I can adapt this sample

My code to clean the dates is:

df['clean_date'] = df.dates.apply(
lambda x: pd.to_datetime(x).strftime('%m/%d/%Y'))

df

I would like help to convert and clean my dates including the historic dates. Grateful for assistance with this.

Upvotes: 2

Views: 598

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

As it is clearly stated in the online documentation you can't have values of datetime64[ns] dtype that are not falling into ['1677-09-21 00:12:43.145225', '2262-04-11 23:47:16.854775807'].

But you can have such dates as Period dtype.

Sample input dataset:

In [156]: df
Out[156]:
           Date
0   27 Feb 1928
1          1920
2  October 2000
3          1500
4          1625
5      Mar 1723

In [157]: df.dtypes
Out[157]:
Date    object
dtype: object

Solution:

In [158]: df["new"] = pd.PeriodIndex([pd.Period(d, freq="D") for d in df.Date])

Result:

In [159]: df
Out[159]:
           Date         new
0   27 Feb 1928  1928-02-27
1          1920  1920-01-01
2  October 2000  2000-10-01
3          1500  1500-01-01
4          1625  1625-01-01
5      Mar 1723  1723-03-01

In [160]: df.dtypes
Out[160]:
Date       object
new     period[D]
dtype: object

In [161]: df["new"].dt.year
Out[161]:
0    1928
1    1920
2    2000
3    1500
4    1625
5    1723
Name: new, dtype: int64

Upvotes: 3

Related Questions