Reputation: 225
With Pandas I am using this answer to clean up dates with a variety of formats. This works perfectly if I filter out the dates that are prior to 1677. However my dates are historic and many date before 1677 so I get an OutOfBoundsDatetime error.
My data contains dates like:
27 Feb 1928,
1920,
October 2000,
1500,
1625,
Mar 1723
I can see a reference here to using pd.Period but I don't know how to apply it to my case as the dates need to be cleaned first before I can adapt this sample
My code to clean the dates is:
df['clean_date'] = df.dates.apply(
lambda x: pd.to_datetime(x).strftime('%m/%d/%Y'))
df
I would like help to convert and clean my dates including the historic dates. Grateful for assistance with this.
Upvotes: 2
Views: 598
Reputation: 210882
As it is clearly stated in the online documentation you can't have values of datetime64[ns]
dtype that are not falling into ['1677-09-21 00:12:43.145225', '2262-04-11 23:47:16.854775807']
.
But you can have such dates as Period dtype.
Sample input dataset:
In [156]: df
Out[156]:
Date
0 27 Feb 1928
1 1920
2 October 2000
3 1500
4 1625
5 Mar 1723
In [157]: df.dtypes
Out[157]:
Date object
dtype: object
Solution:
In [158]: df["new"] = pd.PeriodIndex([pd.Period(d, freq="D") for d in df.Date])
Result:
In [159]: df
Out[159]:
Date new
0 27 Feb 1928 1928-02-27
1 1920 1920-01-01
2 October 2000 2000-10-01
3 1500 1500-01-01
4 1625 1625-01-01
5 Mar 1723 1723-03-01
In [160]: df.dtypes
Out[160]:
Date object
new period[D]
dtype: object
In [161]: df["new"].dt.year
Out[161]:
0 1928
1 1920
2 2000
3 1500
4 1625
5 1723
Name: new, dtype: int64
Upvotes: 3