AndrewK
AndrewK

Reputation: 27

Nested Case when statements in Python

The following code creates as column called ‘Flagged’ in SAS:

Case When t3.Proportion=. then case when t3.'Standardised proportion'n >t1.SigmaMultiple Then 1 else 0  
End
Else 
Case When t3.Proportion=. and  abs(t3.'Standardised proportion'n) > t1.SigmaMultiple Then 1 Else 0
End
End

I'm trying to replicate it in python and usually I would have done a conditions code however the nested Case when aspect is confusing me.

The code i tried but didn't seem to be matching:

conditions =[
     ((dfSigmamissing['Proportion'] == 0) & (dfSigmamissing['SP'] > dfSigmamissing['SigmaMultiple'])),
       ((dfSigmamissing['Proportion'] == 0) & (dfSigmamissing['SP'] < dfSigmamissing['SigmaMultiple'])),
        ((dfSigmamissing['SP'].abs() > (dfSigmamissing['SigmaMultiple'])))
                ]

choices = [1,0,1]

dfSigmamissing['Flagged'] = np.select(conditions, choices, default=0)

Any help would be greatly appreciated.

Thank you

Upvotes: 1

Views: 903

Answers (1)

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

You're very close I think, you need to remove the second condition from the conditions list, because if the first condition fails, np.select will fall back there. Otherwise, one of the first two conditions are (almost) always True since they are (almost) complementary. Also, we need dfSigmamissing['Proportion'] == 0 for the last condition, too:

conditions = [
     (dfSigmamissing["Proportion"] == 0) & (dfSigmamissing["SP"] > dfSigmamissing["SigmaMultiple"]),
     (dfSigmamissing["Proportion"] == 0) & (dfSigmamissing["SP"].abs() > dfSigmamissing["SigmaMultiple"])
]

choices = [1, 1]

default = 0

dfSigmamissing["Flagged"] = np.select(conditions, choices, default)

This works as:

  • check first condition holds?

    -- if so put 1 in the corresponding row

    -- if not, go to second condition

  • does the second condition hold?

    -- if so, put 1 in the corresponding row

    -- if not, fallback to default value, i.e., 0

To avoid repetition in the code, we can refactor some condition elements:

zero_prop = dfSigmamissing["Proportion"] == 0
sp = dfSigmamissing["SP"]
sigma_mul = dfSigmamissing["SigmaMultiple"]

conditions = [
    zero_prop & (sp > sigma_mul),
    zero_prop & (sp.abs() > sigma_mul)
]

which might increase readibility.

We can go even further and glue the first two conditions since both will output 1. This means we have 2 conditions now, so we can go for np.where which is this special case of np.select. Furhter, as far as I see, sp > sigma_mul is contained in sp.abs() > sigma_mul, so we can drop the former:

condition = zero_prop & (sp.abs() > sigma_mul)
dfSigmamissing["Flagged"] = np.where(condition, 1, 0)

Upvotes: 1

Related Questions