Reputation: 105
I have a Python Pandas dataframe that consists of a dueDate field of type timedelta64. I also have a column in the dataframe that is boolean whether a task is completed.
from datetime import date
df = pd.DataFrame(data = [pd.date_range('1/1/2017', periods = 6), [True if i%2 == 0 else False for i in range(6)]]).T
df.columns = ['dueDate', 'completed']
If the task is not completed and the due date has passed (greater than today), I want to populate a row in a column called daysLate that stores this value. Otherwise, I want that row to be NaN.
Other than iterating through each row and apply multiple if-statements, I am wondering if there is an elegant or best practice way of approached a problem like this?
Upvotes: 0
Views: 259
Reputation: 15953
import numpy as np
import pandas as pd
df = pd.DataFrame(data=[pd.date_range('1/1/2017', periods=10), [True if i % 2 == 0 else False for i in range(10)]]).T
df.columns = ['dueDate', 'completed']
df['daysLate'] = np.nan
df['daysLate'][(df.dueDate > pd.to_datetime('today')) & (df.completed != True)] = df.dueDate - pd.to_datetime('today')
print(df)
Is this what you're looking for?
dueDate completed daysLate
0 2017-01-01 00:00:00 True NaT
1 2017-01-02 00:00:00 False NaT
2 2017-01-03 00:00:00 True NaT
3 2017-01-04 00:00:00 False NaT
4 2017-01-05 00:00:00 True NaT
5 2017-01-06 00:00:00 False NaT
6 2017-01-07 00:00:00 True NaT
7 2017-01-08 00:00:00 False 2 days
8 2017-01-09 00:00:00 True NaT
9 2017-01-10 00:00:00 False 4 days
Actually, if you are ok with NaT
you can skip importing numpy
and change df['daysLate'] = np.nan
to df['daysLate'] = np.NaT
Upvotes: 1