ML_Dev
ML_Dev

Reputation: 105

How can one conditionally compute date intervals in a Python pandas dataframe?

I have a Python Pandas dataframe that consists of a dueDate field of type timedelta64. I also have a column in the dataframe that is boolean whether a task is completed.

from datetime import date
df = pd.DataFrame(data = [pd.date_range('1/1/2017', periods = 6), [True if i%2    == 0  else False for i in range(6)]]).T
df.columns = ['dueDate', 'completed']

If the task is not completed and the due date has passed (greater than today), I want to populate a row in a column called daysLate that stores this value. Otherwise, I want that row to be NaN.

Pandas dataframe desired output

Other than iterating through each row and apply multiple if-statements, I am wondering if there is an elegant or best practice way of approached a problem like this?

Upvotes: 0

Views: 259

Answers (1)

Leb
Leb

Reputation: 15953

import numpy as np
import pandas as pd


df = pd.DataFrame(data=[pd.date_range('1/1/2017', periods=10), [True if i % 2 == 0  else False for i in range(10)]]).T
df.columns = ['dueDate', 'completed']

df['daysLate'] = np.nan

df['daysLate'][(df.dueDate > pd.to_datetime('today')) & (df.completed != True)] = df.dueDate - pd.to_datetime('today')

print(df)

Is this what you're looking for?

               dueDate completed  daysLate
0  2017-01-01 00:00:00      True       NaT
1  2017-01-02 00:00:00     False       NaT
2  2017-01-03 00:00:00      True       NaT
3  2017-01-04 00:00:00     False       NaT
4  2017-01-05 00:00:00      True       NaT
5  2017-01-06 00:00:00     False       NaT
6  2017-01-07 00:00:00      True       NaT
7  2017-01-08 00:00:00     False    2 days
8  2017-01-09 00:00:00      True       NaT
9  2017-01-10 00:00:00     False    4 days

Actually, if you are ok with NaT you can skip importing numpy and change df['daysLate'] = np.nan to df['daysLate'] = np.NaT

Upvotes: 1

Related Questions