Aman Singh
Aman Singh

Reputation: 1241

Filling missing date values with the least possible date in Pandas dataframe

I have a dataframe with a date column as,

df = pd.DataFrame({'date':['2014-10-01', np.nan, '2015-09-30', np.nan, np.nan, '2019-06-03']})

Now I want to impute the missing date values with the least possible date value in pandas. Imputing the current date is easy datetime.now() but for one particular case I want the NaN values to be imputed with the least possible value in dates.

Now datetime allows minimum date to be as '0001-01-01' but the same is not acceptable with pandas. Upon imputing this value the error I get is

OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00

I tried looking up on stackoverflow but couldn't find a possible answer to minimum acceptable date in pandas.

Is anyone aware of this?

EDIT: I'm not really concerned with 'OutOfBondsDatetime', I'm curios to know the least possible date that pandas can accept.

Upvotes: 3

Views: 3409

Answers (1)

cs95
cs95

Reputation: 402493

If you want a date that plays nicely with pandas, you'll need to consider pd.Timestamp, since this is the datetime type that pandas works with.

If you don't mind your dates having a time component, use pd.Timestamp.min:

pd.Timestamp.min
# Timestamp('1677-09-21 00:12:43.145225')

pd.to_datetime(df['date'].fillna(pd.Timestamp.min))

0   2014-10-01 00:00:00.000000
1   1677-09-21 00:12:43.145225
2   2015-09-30 00:00:00.000000
3   1677-09-21 00:12:43.145225
4   1677-09-21 00:12:43.145225
5   2019-06-03 00:00:00.000000
Name: date, dtype: datetime64[ns]

If you only want the dates (without times), then the smallest date sans time component would be

pd.Timestamp.min.ceil('D')
# Timestamp('1677-09-22 00:00:00')

pd.to_datetime(df['date'].fillna(pd.Timestamp.min.ceil('D')))

0   2014-10-01
1   1677-09-22
2   2015-09-30
3   1677-09-22
4   1677-09-22
5   2019-06-03
Name: date, dtype: datetime64[ns]

Upvotes: 3

Related Questions