Bartek Malysz
Bartek Malysz

Reputation: 1022

Pandas apply value if a certain condition is met

I have a sample dataframe as per below:

{'Plan Date': {23186: nan,
  18533: nan,
  17980: nan,
  28559: nan,
  22587: 1572480000000000000,
  22522: 1580428800000000000,
  16639: 1572480000000000000,
  14304: 1572480000000000000},
 'Support Date': {23186: Timestamp('2020-12-31 00:00:00'),
  18533: Timestamp('2022-03-31 00:00:00'),
  17980: Timestamp('2022-03-31 00:00:00'),
  28559: Timestamp('2019-02-28 00:00:00'),
  22587: Timestamp('2019-07-31 00:00:00'),
  22522: Timestamp('2019-07-31 00:00:00'),
  16639: Timestamp('2019-10-31 00:00:00'),
  14304: Timestamp('2019-10-31 00:00:00')},
 'Plan Deadline': {23186: NaT,
  18533: NaT,
  17980: NaT,
  28559: NaT,
  22587: Timestamp('2019-10-31 00:00:00'),
  22522: Timestamp('2020-01-31 00:00:00'),
  16639: Timestamp('2019-10-31 00:00:00'),
  14304: Timestamp('2019-10-31 00:00:00')}}

I wanted to populate Plan Date column with max of the 2 dates sitting in other columns in the same dataframe only if Plan Deadline column is populated. Using np.where() was very convoluted and I was hoping for a more elegant / efficient way.

Upvotes: 1

Views: 227

Answers (2)

jezrael
jezrael

Reputation: 862591

You can avoid apply for better performance and assign max value per condition:

data['Plan Date'] = pd.to_datetime(data['Plan Date'])

mask = data['Plan Deadline'].notnull()
data.loc[mask, 'Plan Date'] = data.loc[mask, ['Support Date', 'Plan Deadline']].max(axis=1)
print (data)
       Plan Date Support Date Plan Deadline
14304 2019-10-31   2019-10-31    2019-10-31
16639 2019-10-31   2019-10-31    2019-10-31
17980        NaT   2022-03-31           NaT
18533        NaT   2022-03-31           NaT
22522 2020-01-31   2019-07-31    2020-01-31
22587 2019-10-31   2019-07-31    2019-10-31
23186        NaT   2020-12-31           NaT
28559        NaT   2019-02-28           NaT

Upvotes: 1

Bartek Malysz
Bartek Malysz

Reputation: 1022

I found a way (with a help of a friend) to do it omitting np.where and convoluted conditions all together:

data['Plan Date'] = data[['Support Date', 'Plan Deadline']].apply(lambda x: pd.NaT if pd.isnull(x[1]) else max(x), axis=1)

Upvotes: 0

Related Questions