Reputation: 89
On Pandas Dataframes-
I have a dataframe that looks like this:
ID date1 date2 date3
0 NaT NaT 2019-02-28
1 NaT 2019-11-10 2019-04-09
2 NaT NaT 2019-05-21
3 2019-12-06 2019-12-05 2019-12-27
4 2019-11-18 2019-11-18 2019-12-20
I want to add another field in the dataframe in order to have an output like this:
ID date1 date2 date3 date4
0 NaT NaT 2019-02-28 2019-02-28
1 NaT 2019-11-10 2019-04-09 2019-11-10
2 NaT NaT 2019-05-21 2019-05-21
3 2019-12-06 2019-12-05 2019-12-27 2019-12-06
4 2019-11-18 2019-11-18 2019-12-20 2019-11-18
In order,
The date4 will be populated by the date3 if date2 and date1 not populated.
The date4 will be populated by the date2 if date2 and date3 populated.
The date4 will be populated by the date1 if date1, date2 and date3 populated.
Do you have an idea on how achieve this ?
I tried the fillna
but not really successfull.
Upvotes: 1
Views: 1561
Reputation: 33
I'd personally use a loop so it's clear what's happening, and more maintainable. Here's an example. Using your listed conditions for 'date4' doesn't fill all with an entry.
# required modules
import pandas as pd
import itertools
# create a data frame
d = {'ID': [0, 1, 2, 3, 4],
'date1': [None, None, None, '2019-12-06', '2019-11-18'],
'date2': [None, '2019-11-10', None, '2019-12-05', '2019-11-18'],
'date3': ['2019-02-28', '2019-04-09', '2019-05-21', '2019-12-27', '2019-12-20']}
dat = pd.DataFrame(d)
# make date4
date4 = []
for index, row in dat.iterrows():
if row['date2'] and row['date1'] == None:
date4.append(row['date3'])
elif row['date2'] and row['date3'] != None:
date4.append(row['date2'])
elif row['date1'] and row['date2'] and row['date3'] != None:
date4.append(row['date1'])
else:
date4.append(None)
# append to df
dat['date4'] = date4
Upvotes: 1
Reputation: 5738
You can replace NaN values like this:
df['date4']=np.where(df['date1'].isna(),df['date2'],df['date1'])
df['date4']=np.where(df['date4'].isna(),df['date3'],df['date3'])
There are some cases you didnt consider (like what if date1 is not NaN but date2 is) but those all can be dealt with similarly.
Upvotes: 1
Reputation: 862911
Use back filling missing values and then select first column by positions by DataFrame.iloc
:
df['date4'] = df[['date1','date2','date3']].bfill(axis=1).iloc[:, 0]
Or use Series.fillna
or Series.combine_first
:
df['date4'] = df['date1'].fillna(df['date2']).fillna(df['date3'])
print (df)
ID date1 date2 date3 date4
0 0 NaT NaT 2019-02-28 2019-02-28
1 1 NaT 2019-11-10 2019-04-09 2019-11-10
2 2 NaT NaT 2019-05-21 2019-05-21
3 3 2019-12-06 2019-12-05 2019-12-27 2019-12-06
4 4 2019-11-18 2019-11-18 2019-12-20 2019-11-18
Upvotes: 3