Reputation: 27
The following code creates as column called ‘Flagged’ in SAS:
Case When t3.Proportion=. then case when t3.'Standardised proportion'n >t1.SigmaMultiple Then 1 else 0
End
Else
Case When t3.Proportion=. and abs(t3.'Standardised proportion'n) > t1.SigmaMultiple Then 1 Else 0
End
End
I'm trying to replicate it in python and usually I would have done a conditions code however the nested Case when aspect is confusing me.
The code i tried but didn't seem to be matching:
conditions =[
((dfSigmamissing['Proportion'] == 0) & (dfSigmamissing['SP'] > dfSigmamissing['SigmaMultiple'])),
((dfSigmamissing['Proportion'] == 0) & (dfSigmamissing['SP'] < dfSigmamissing['SigmaMultiple'])),
((dfSigmamissing['SP'].abs() > (dfSigmamissing['SigmaMultiple'])))
]
choices = [1,0,1]
dfSigmamissing['Flagged'] = np.select(conditions, choices, default=0)
Any help would be greatly appreciated.
Thank you
Upvotes: 1
Views: 903
Reputation: 18306
You're very close I think, you need to remove the second condition from the conditions
list, because if the first condition fails, np.select
will fall back there. Otherwise, one of the first two conditions are (almost) always True since they are (almost) complementary. Also, we need
dfSigmamissing['Proportion'] == 0
for the last condition, too:
conditions = [
(dfSigmamissing["Proportion"] == 0) & (dfSigmamissing["SP"] > dfSigmamissing["SigmaMultiple"]),
(dfSigmamissing["Proportion"] == 0) & (dfSigmamissing["SP"].abs() > dfSigmamissing["SigmaMultiple"])
]
choices = [1, 1]
default = 0
dfSigmamissing["Flagged"] = np.select(conditions, choices, default)
This works as:
check first condition holds?
-- if so put 1 in the corresponding row
-- if not, go to second condition
does the second condition hold?
-- if so, put 1 in the corresponding row
-- if not, fallback to default value, i.e., 0
To avoid repetition in the code, we can refactor some condition elements:
zero_prop = dfSigmamissing["Proportion"] == 0
sp = dfSigmamissing["SP"]
sigma_mul = dfSigmamissing["SigmaMultiple"]
conditions = [
zero_prop & (sp > sigma_mul),
zero_prop & (sp.abs() > sigma_mul)
]
which might increase readibility.
We can go even further and glue the first two conditions since both will output 1. This means we have 2 conditions now, so we can go for np.where
which is this special case of np.select
. Furhter, as far as I see, sp > sigma_mul
is contained in sp.abs() > sigma_mul
, so we can drop the former:
condition = zero_prop & (sp.abs() > sigma_mul)
dfSigmamissing["Flagged"] = np.where(condition, 1, 0)
Upvotes: 1