superflow
superflow

Reputation: 53

Add date in rows where date is missing based on other column

I have a data frame with time series data. In one column I have signup dates, and in the other cancel dates. I want to add a date for missing cancel dates that is less than a specific date, but maximum 40 weeks.

How should I proceed?

if df['cancel_date'] is NaT, then add date max. + 40 weeks. df['cancel_date'] - df['signup_date'] should not be less than 0.

Upvotes: 0

Views: 51

Answers (2)

PV8
PV8

Reputation: 6260

I would use numpy.where, if you want to append a difference column directly between singup date and cancel date:

df['date difference between signup and cancel'] = np.where(df['cancel_date'] == np.nan, (df['signup_date'].max() + pd.Timedelta(40, 'W'))-df['signup_date'], df['cancel_date']-df['signup_date'])

This will give you a new column where you would have directly the difference between the signup date and the cancel date

Upvotes: -1

Chris Adams
Chris Adams

Reputation: 18647

IIUC, you can use Series.fillna with pandas.Timedelta class.

If adding 40 weeks to the records signup_date:

df['cancel_date'] = df['cancel_date'].fillna(df['signup_date'] + pd.Timedelta(40, 'W'))

If adding 40 weeks to maximum date in the sign_up column:

df['cancel_date'] = df['cancel_date'].fillna(df['signup_date'].max() + pd.Timedelta(40, 'W'))

Or if using some predefined max date value, with the constraint that signup_date < cancel_date, chain on the clip method:

max_date = pd.datetime(2018, 4, 30)

df['cancel_date'] = df['cancel_date'].fillna(max_date + pd.Timedelta(40, 'W')).clip(lower=df.signup_date)

Upvotes: 2

Related Questions