Reputation: 53
I have a data frame with time series data. In one column I have signup dates, and in the other cancel dates. I want to add a date for missing cancel dates that is less than a specific date, but maximum 40 weeks.
How should I proceed?
if df['cancel_date']
is NaT
, then add date max. + 40 weeks.
df['cancel_date'] - df['signup_date']
should not be less than 0.
Upvotes: 0
Views: 51
Reputation: 6260
I would use numpy.where
, if you want to append a difference column directly between singup date and cancel date:
df['date difference between signup and cancel'] = np.where(df['cancel_date'] == np.nan, (df['signup_date'].max() + pd.Timedelta(40, 'W'))-df['signup_date'], df['cancel_date']-df['signup_date'])
This will give you a new column where you would have directly the difference between the signup date and the cancel date
Upvotes: -1
Reputation: 18647
IIUC, you can use Series.fillna
with pandas.Timedelta
class.
If adding 40 weeks to the records signup_date
:
df['cancel_date'] = df['cancel_date'].fillna(df['signup_date'] + pd.Timedelta(40, 'W'))
If adding 40 weeks to maximum date in the sign_up
column:
df['cancel_date'] = df['cancel_date'].fillna(df['signup_date'].max() + pd.Timedelta(40, 'W'))
Or if using some predefined max date
value, with the constraint that signup_date < cancel_date
, chain on the clip
method:
max_date = pd.datetime(2018, 4, 30)
df['cancel_date'] = df['cancel_date'].fillna(max_date + pd.Timedelta(40, 'W')).clip(lower=df.signup_date)
Upvotes: 2