Jenny Jing Yu
Jenny Jing Yu

Reputation: 195

Calculate date time difference of two columns with null value in dataframe

I had a dataframe like below:

    startdate   terminationdate
0   1997-07-13  2004-09-29
1   1999-07-26  2016-03-23
2   2003-04-01  NaT
3   2007-06-01  NaT
4   2009-06-01  NaT

I would like to get the output to calculate the tenure in months. For null value in terminationdate, I would like to use current date to calculate.

I tried the code below:


def tenure(df):

    if df['terminationdate'] != np.nan:
        tenure = (df['terminationdate'] - df['startdate'])/np.timedelta64(1, 'M')

    else:
        tenure = (datetime.datetime.now() - df['startdate'])/np.timedelta64(1, 'M')
    return tenure

The tenure of NaT value could not be calculated with above code.

Upvotes: 1

Views: 2519

Answers (3)

Serge Ballesta
Serge Ballesta

Reputation: 148880

You can just use fillna to replace the NaT in terminationdate with current one:

tenure = (df['terminationdate'].fillna(datetime.datetime.today()) - df['startdate'])/
          np.timedelta64(1, 'M')

Upvotes: 2

Shenan
Shenan

Reputation: 338

tawab_shakeel's answer is valid. In case you want to keep the null in the dataset and follow your logic, you can use the apply function to each raw

from datetime import datetime
import pandas as pd
import numpy as np

df = pd.DataFrame({'startdate': pd.to_datetime(['1997-07-13', '1999-07-26', '2003-04-01', '2007-06-01', '2009-06-01']),
                   'terminationdate': pd.to_datetime(['2004-09-29', '2016-03-23', None, None, None])})

def tenure(start_date, end_date):
    if pd.isna(end_date) == False:
        tenure = (end_date - start_date)/np.timedelta64(1, 'M')
    else: 
        tenure = (datetime.now() - start_date)/np.timedelta64(1, 'M')
    return tenure

tenure_month = df.apply(lambda row: tenure(row['startdate'], row['terminationdate']), axis = 1)

Upvotes: 2

tawab_shakeel
tawab_shakeel

Reputation: 3739

Try converting columns into date using pd.to_datetime()
fill current date using np.where and datetime.datetime.now().date
final apply your formula of difference

import pandas as pd
import numpy as np
import datetime

df['startdate'] = pd.to_datetime(df['startdate']).dt.date
df['terminationdate'] = pd.to_datetime(df['terminationdate']).dt.date

df['terminationdate'] = np.where(df['terminationdate'].isnull(),
                                datetime.datetime.now().date(),
                                df['terminationdate'])

df['result'] = (df['terminationdate'] - df['startdate'])/np.timedelta64(1, 'M')

Upvotes: 0

Related Questions