Reputation: 2871
I have df as shown below
df:
ID Actual_DOB Tentative_DOB
1 NaN 2002-01-01
2 2020-06-23 2020-01-01
3 NaN NaN
4 2018-06-29 NaN
About df:
Actual_DOB
- Actual date of birth,
Tentative_DOB
- Tentative date of birth - when the exact date of birth is not available - only the year of birth is available - then date of birth is assumed on 1st Jan of the birth year.
From the above, I would like to calculate a column named age_in_days based on the conditions explained below.
Steps:
1. If Actual_DOB is available (not an NaT)
age_in_days
is the number of days from Actual_DOB
to 2021-03-31
2. Else,
age_in_days
is the number of days from Tentative_DOB
to 2021-03-31
I tried below code:
df['Actual_DOB'] = pd.to_datetime(df.Actual_DOB)
df['Tentative_DOB'] = pd.to_datetime(df.Tentative_DOB)
df['age_in_days1'] = (pd.Timestamp('2021-03-31') - df.Actual_DOB).dt.days
df['age_in_days2'] = (pd.Timestamp('2021-03-31') - df.Tentative_DOB).dt.days
Expected Output:
ID Actual_DOB Tentative_DOB age_in_days
1 NaN 2002-01-01 7029
2 2020-06-23 2020-01-01 281
3 NaT NaT NaN
4 2018-06-29 NaT 1006
Upvotes: 1
Views: 75
Reputation: 133458
With your shown samples, could you please try following. Using fillna
along with pd.to_datetime
functions here. Simple explanation would be, create date variable where we are filling values of NaNs with respect to Tentative_DOB
column in Actuak_DOB
column. Then using pd.to.datetime
function to make sure that time is in correct format. Finally substracting date variable value with 2021-03-31 date(given by OP) and converting it to days into a new column of DataFrame.
date = df['Actual_DOB'].fillna(df['Tentative_DOB'])
date = pd.to_datetime(date,yearfirst=True)
df['age_in_days'] = (pd.Timestamp('2021-03-31')-date).dt.days
Output of DataFrame will be as follows:
ID Actual_DOB Tentative_DOB age_in_days
0 1 NaN 2002-01-01 7029.0
1 2 2020-06-23 2020-01-01 281.0
2 3 NaN NaN NaN
3 4 2018-06-29 NaN 1006.0
Upvotes: 3
Reputation: 26676
Use np.select(condition_list, results-list)
Coarce the dates to datetime and append tcolumn with 2021-03-31
df['Actual_DOB'],df['Tentative_DOB'], df['age_in_days']=pd.to_datetime(df['Actual_DOB']),pd.to_datetime(df['Tentative_DOB']),pd.Timestamp('2021-03-31')
condition_list
cond=[df.Actual_DOB.notna(),df.Tentative_DOB.notna()]
results-list
res=[df['age_in_days']-df.Actual_DOB,df['age_in_days']-df.Tentative_DOB]
np.where
df['age_in_days']=np.select(cond,res, np.nan)/ np.timedelta64(1, 'D')
Upvotes: 1