Reputation: 1022
I have a sample dataframe as per below:
{'Plan Date': {23186: nan,
18533: nan,
17980: nan,
28559: nan,
22587: 1572480000000000000,
22522: 1580428800000000000,
16639: 1572480000000000000,
14304: 1572480000000000000},
'Support Date': {23186: Timestamp('2020-12-31 00:00:00'),
18533: Timestamp('2022-03-31 00:00:00'),
17980: Timestamp('2022-03-31 00:00:00'),
28559: Timestamp('2019-02-28 00:00:00'),
22587: Timestamp('2019-07-31 00:00:00'),
22522: Timestamp('2019-07-31 00:00:00'),
16639: Timestamp('2019-10-31 00:00:00'),
14304: Timestamp('2019-10-31 00:00:00')},
'Plan Deadline': {23186: NaT,
18533: NaT,
17980: NaT,
28559: NaT,
22587: Timestamp('2019-10-31 00:00:00'),
22522: Timestamp('2020-01-31 00:00:00'),
16639: Timestamp('2019-10-31 00:00:00'),
14304: Timestamp('2019-10-31 00:00:00')}}
I wanted to populate Plan Date column with max of the 2 dates sitting in other columns in the same dataframe only if Plan Deadline column is populated. Using np.where() was very convoluted and I was hoping for a more elegant / efficient way.
Upvotes: 1
Views: 227
Reputation: 862591
You can avoid apply
for better performance and assign max
value per condition:
data['Plan Date'] = pd.to_datetime(data['Plan Date'])
mask = data['Plan Deadline'].notnull()
data.loc[mask, 'Plan Date'] = data.loc[mask, ['Support Date', 'Plan Deadline']].max(axis=1)
print (data)
Plan Date Support Date Plan Deadline
14304 2019-10-31 2019-10-31 2019-10-31
16639 2019-10-31 2019-10-31 2019-10-31
17980 NaT 2022-03-31 NaT
18533 NaT 2022-03-31 NaT
22522 2020-01-31 2019-07-31 2020-01-31
22587 2019-10-31 2019-07-31 2019-10-31
23186 NaT 2020-12-31 NaT
28559 NaT 2019-02-28 NaT
Upvotes: 1
Reputation: 1022
I found a way (with a help of a friend) to do it omitting np.where and convoluted conditions all together:
data['Plan Date'] = data[['Support Date', 'Plan Deadline']].apply(lambda x: pd.NaT if pd.isnull(x[1]) else max(x), axis=1)
Upvotes: 0