Reputation: 89
I want to create a function that flags the rows based on certain conditions.
It's not working and I think it's caused by the format of the columns.
The function is:
tolerance=5
def pmm_m2_rag(data):
if data['m2'] == data['TP_M2'] and data['m6p'] + pd.to_timedelta(tolerance,unit='D') <= data['latedate']:
return 'GREEN'
elif data['m2']!= data['TP_M2'] and data['m6p'] + pd.to_timedelta(tolerance,unit='D') < data['latedate']:
return 'AMBER'
elif data['m2']!= None and data['m6p'] + pd.to_timedelta(tolerance,unit='D') > data['latedate']:
return 'RED'
The dataframe is :
m2 TP_M2 m6p latedate
0 2019-11-28 2019-10-29 2020-02-21 2020-02-25
1 2019-11-28 2019-10-29 2020-02-21 2020-02-25
2 2019-11-28 2019-11-28 2020-02-09 2020-02-17
3 2019-11-28 2019-11-28 2020-02-29 2020-02-17
The datatype is:
m2 object
TP_M2 object
m6p object
latedate object
dtype: object
Expected output:
m2 TP_M2 m6p latedate RAG
0 2019-11-28 2019-10-29 2020-02-21 2020-02-25 AMBER
1 2019-11-28 2019-10-29 2020-02-21 2020-02-25 AMBER
2 2019-11-28 2019-11-28 2020-02-09 2020-02-17 GREEN
3 2019-11-28 2019-11-28 2020-02-29 2020-02-17 RED
Upvotes: 0
Views: 1158
Reputation: 3097
one option it to convert object
into datetime
before doing the datetime
comparisons as below
from datetime import datetime
tolerance=5
def pmm_m2_rag(data):
#m2 = datetime.strptime(data['m2'],'%Y-%m-%d')
#m6p = datetime.strptime(data['m6p'],'%Y-%m-%d')
#latedate = datetime.strptime(data['latedate'],'%Y-%m-%d')
#TP_M2 = datetime.strptime(data['TP_M2'],'%Y-%m-%d')
m2 = datetime.strptime(str(data['m2']),'%Y-%m-%d')
m6p = datetime.strptime(str(data['m6p']),'%Y-%m-%d')
latedate = datetime.strptime(str(data['latedate']),'%Y-%m-%d')
TP_M2 = datetime.strptime(str(data['TP_M2']),'%Y-%m-%d')
if m2 == TP_M2 and m6p + pd.to_timedelta(tolerance,unit='D') <= latedate:
return 'GREEN'
elif m2!= TP_M2 and m6p + pd.to_timedelta(tolerance,unit='D') < latedate:
return 'AMBER'
elif m2!= None and m6p + pd.to_timedelta(tolerance,unit='D') > latedate:
return 'RED'
df['RAG'] = df.apply(pmm_m2_rag, axis=1)
Upvotes: 1
Reputation: 3841
First of all, something in your code seems to be wrong. This
... unit='D') <= data['latedate'] < data['m6p'] ...
chaining of comparisons is definitely wrong.
Then in your conditon for AMBER the two clauses of you or
are identical. This also makes no sense.
Apart from that you should convert the datatypes of your columns to type datetime
. E.g. by:
data = data.applymap(pd.to_datetime)
This depends on what the datatype is when you read from your database.
After that, there are basically two options. You can write a function that takes a single row, calculates the value and returns the color. Then apply this function row by row.
The other (faster and preferrable) option is to calculate the column 'RAG' in parallel.
This can be done by using numpy.where
with the conditions you have written above.
Please note that and
between datafram columns has to be written as &
; or
as |
.
Something like this should work:
import numpy as np
def pmm_m2_rag(data):
green_filter = (data.m2 == data.TP_M2) & \
(data.m6p + pd.to_timedelta(tolerance,unit='D') <= data.latedate)
amber_filter = (data.m2 != data.TP_M2) & \
(data.m6p + pd.to_timedelta(tolerance,unit='D') < data.latedate) | \
(data.latedate > data.m6p + pd.to_timedelta(tolerance,unit='D'))
red_filter = (data.m2 != pd.NaT) & \
(data.m6p + pd.to_timedelta(tolerance,unit='D') > data.latedate)
data['RAG'] = np.where(green_filter, 'GREEN', np.where(amber_filter, 'AMBER', np.where(red_filter, 'RED', '')))
The syntax of np.where
is
np.where(<CONDITION>, true-clause, false-clause)
Upvotes: 1