Danish
Danish

Reputation: 2871

Calculate age in days from date column based availability actual date birth - else consider tentative DOB column - pandas

I have df as shown below

df:

ID       Actual_DOB      Tentative_DOB
1        NaN             2002-01-01
2        2020-06-23      2020-01-01
3        NaN             NaN
4        2018-06-29      NaN 

About df:

Actual_DOB - Actual date of birth,

Tentative_DOB - Tentative date of birth - when the exact date of birth is not available - only the year of birth is available - then date of birth is assumed on 1st Jan of the birth year.

From the above, I would like to calculate a column named age_in_days based on the conditions explained below.

Steps:

1. If Actual_DOB is available (not an NaT)

age_in_days is the number of days from Actual_DOB to 2021-03-31

2. Else, 

age_in_days is the number of days from Tentative_DOB to 2021-03-31

I tried below code:

df['Actual_DOB'] = pd.to_datetime(df.Actual_DOB)
df['Tentative_DOB'] = pd.to_datetime(df.Tentative_DOB)
df['age_in_days1'] = (pd.Timestamp('2021-03-31') - df.Actual_DOB).dt.days
df['age_in_days2'] = (pd.Timestamp('2021-03-31') - df.Tentative_DOB).dt.days

Expected Output:

ID       Actual_DOB      Tentative_DOB    age_in_days
1        NaN             2002-01-01       7029
2        2020-06-23      2020-01-01       281
3        NaT             NaT              NaN
4        2018-06-29      NaT              1006  

Upvotes: 1

Views: 75

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133458

With your shown samples, could you please try following. Using fillna along with pd.to_datetime functions here. Simple explanation would be, create date variable where we are filling values of NaNs with respect to Tentative_DOB column in Actuak_DOB column. Then using pd.to.datetime function to make sure that time is in correct format. Finally substracting date variable value with 2021-03-31 date(given by OP) and converting it to days into a new column of DataFrame.

date = df['Actual_DOB'].fillna(df['Tentative_DOB'])
date  = pd.to_datetime(date,yearfirst=True)
df['age_in_days'] = (pd.Timestamp('2021-03-31')-date).dt.days

Output of DataFrame will be as follows:

   ID  Actual_DOB Tentative_DOB  age_in_days
0   1         NaN    2002-01-01       7029.0
1   2  2020-06-23    2020-01-01        281.0
2   3         NaN           NaN          NaN
3   4  2018-06-29           NaN       1006.0

Upvotes: 3

wwnde
wwnde

Reputation: 26676

Use np.select(condition_list, results-list)

Coarce the dates to datetime and append tcolumn with 2021-03-31

df['Actual_DOB'],df['Tentative_DOB'], df['age_in_days']=pd.to_datetime(df['Actual_DOB']),pd.to_datetime(df['Tentative_DOB']),pd.Timestamp('2021-03-31')

condition_list

cond=[df.Actual_DOB.notna(),df.Tentative_DOB.notna()]

results-list

res=[df['age_in_days']-df.Actual_DOB,df['age_in_days']-df.Tentative_DOB]

np.where

df['age_in_days']=np.select(cond,res, np.nan)/ np.timedelta64(1, 'D')

Upvotes: 1

Related Questions