Reputation: 1241
I have a dataframe with a date column as,
df = pd.DataFrame({'date':['2014-10-01', np.nan, '2015-09-30', np.nan, np.nan, '2019-06-03']})
Now I want to impute the missing date values with the least possible date value in pandas. Imputing the current date is easy datetime.now()
but for one particular case I want the NaN
values to be imputed with the least possible value in dates.
Now datetime
allows minimum date to be as '0001-01-01'
but the same is not acceptable with pandas. Upon imputing this value the error I get is
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00
I tried looking up on stackoverflow but couldn't find a possible answer to minimum acceptable date in pandas.
Is anyone aware of this?
EDIT: I'm not really concerned with 'OutOfBondsDatetime', I'm curios to know the least possible date that pandas can accept.
Upvotes: 3
Views: 3409
Reputation: 402493
If you want a date that plays nicely with pandas, you'll need to consider pd.Timestamp
, since this is the datetime
type that pandas works with.
If you don't mind your dates having a time component, use pd.Timestamp.min
:
pd.Timestamp.min
# Timestamp('1677-09-21 00:12:43.145225')
pd.to_datetime(df['date'].fillna(pd.Timestamp.min))
0 2014-10-01 00:00:00.000000
1 1677-09-21 00:12:43.145225
2 2015-09-30 00:00:00.000000
3 1677-09-21 00:12:43.145225
4 1677-09-21 00:12:43.145225
5 2019-06-03 00:00:00.000000
Name: date, dtype: datetime64[ns]
If you only want the dates (without times), then the smallest date sans time component would be
pd.Timestamp.min.ceil('D')
# Timestamp('1677-09-22 00:00:00')
pd.to_datetime(df['date'].fillna(pd.Timestamp.min.ceil('D')))
0 2014-10-01
1 1677-09-22
2 2015-09-30
3 1677-09-22
4 1677-09-22
5 2019-06-03
Name: date, dtype: datetime64[ns]
Upvotes: 3