Pandas apply value if a certain condition is met

Question

I have a sample dataframe as per below:

{'Plan Date': {23186: nan,
  18533: nan,
  17980: nan,
  28559: nan,
  22587: 1572480000000000000,
  22522: 1580428800000000000,
  16639: 1572480000000000000,
  14304: 1572480000000000000},
 'Support Date': {23186: Timestamp('2020-12-31 00:00:00'),
  18533: Timestamp('2022-03-31 00:00:00'),
  17980: Timestamp('2022-03-31 00:00:00'),
  28559: Timestamp('2019-02-28 00:00:00'),
  22587: Timestamp('2019-07-31 00:00:00'),
  22522: Timestamp('2019-07-31 00:00:00'),
  16639: Timestamp('2019-10-31 00:00:00'),
  14304: Timestamp('2019-10-31 00:00:00')},
 'Plan Deadline': {23186: NaT,
  18533: NaT,
  17980: NaT,
  28559: NaT,
  22587: Timestamp('2019-10-31 00:00:00'),
  22522: Timestamp('2020-01-31 00:00:00'),
  16639: Timestamp('2019-10-31 00:00:00'),
  14304: Timestamp('2019-10-31 00:00:00')}}

I wanted to populate Plan Date column with max of the 2 dates sitting in other columns in the same dataframe only if Plan Deadline column is populated. Using np.where() was very convoluted and I was hoping for a more elegant / efficient way.

jezrael · Accepted Answer

You can avoid apply for better performance and assign max value per condition:

data['Plan Date'] = pd.to_datetime(data['Plan Date'])

mask = data['Plan Deadline'].notnull()
data.loc[mask, 'Plan Date'] = data.loc[mask, ['Support Date', 'Plan Deadline']].max(axis=1)
print (data)
       Plan Date Support Date Plan Deadline
14304 2019-10-31   2019-10-31    2019-10-31
16639 2019-10-31   2019-10-31    2019-10-31
17980        NaT   2022-03-31           NaT
18533        NaT   2022-03-31           NaT
22522 2020-01-31   2019-07-31    2020-01-31
22587 2019-10-31   2019-07-31    2019-10-31
23186        NaT   2020-12-31           NaT
28559        NaT   2019-02-28           NaT

Pandas apply value if a certain condition is met

Answers (2)

Related Questions