Xiaowan Wen
Xiaowan Wen

Reputation: 145

How to handle multiple conditions in np.where()? not using .apply, as it is too slow

Updated: Having a survey question data set, i need to categorize different answers to to different categories:

Index,Q1,Q2,Q3,Q4,Q5,Q6,Q7
1,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,FALSE
2,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE,TRUE
3,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE,TRUE
4,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
5,TRUE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE
6,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE
7,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE
8,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE,FALSE
9,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE
10,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE
11,TRUE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE

The rules to generate category output (if not mentioned either TRUE or FALSE is acceptable):

                                                      Output
Rule 1  Q1 = TRUE   Q2 = TRUE   Q5 = TRUE             High
Rule 2  Q1 = TRUE   Q2 = TRUE   Q4 = TRUE   Q7=TRUE   High
Rule 3  Q2 = TRUE   Q3 = TRUE   Q6 = TRUE             Mid
Rule 4  Q2 = TRUE   Q3 = TRUE   Q7 = TRUE             Mid
Rule 5  Q4 = TRUE   Q7 = TRUE   Q1 = FALSE  Q2= FALSE Mid-LOW
Rule 6  all = FALSE                                   LOW

I have tried the following code:

c1=df['Q1'].eq('TRUE')
c2=df['Q2'].eq('TRUE')
c3=df['Q5'].eq('TRUE')
c4=df['Q4'].eq('TRUE')
c5=df['Q5'].eq('TRUE')
c6=df['Q6'].eq('TRUE')
c7=df['Q7'].eq('TRUE')
df['Restock Action']=np.where((c1&c2&c5) | (c1&c2&c4&c7),'high','')

Question: How can i handle this situation where: 1. multiple rules have same outcome 2. there are more than one outcomes (i.e. high, mid, mid-low, low)

Thank you!

Upvotes: 0

Views: 83

Answers (1)

Erfan
Erfan

Reputation: 42916

Since you have more than two conditions, you should use np.select where you specify multiple conditions and based on those conditions choices.

We use the NOT operator (~) to catch the False for c1 and c2, since these are the inverse.

conditions = [
    (c1&c2&c5) | (c1&c2&c4&c7),
    (c2&c3&c6) | (c2&c3&c7),
    c4%c7&~c1&~c2
]

choices = ['High', 'Mid', 'Mid-Low']

df['Restock Action'] = np.select(conditions, choices, default='Low')

output

       Q1     Q2     Q3     Q4     Q5     Q6     Q7 Restock Action
0    True  False   True   True  False   True  False            Low
1    True  False  False   True  False  False   True            Low
2   False  False  False   True   True  False   True            Low
3   False  False  False  False  False  False  False            Low
4    True  False   True  False   True   True  False            Low
5   False  False  False  False   True  False   True            Low
6   False  False   True   True   True   True  False            Low
7    True   True  False  False   True  False  False           High
8   False  False  False  False  False  False  False            Low
9   False   True   True  False  False   True   True            Low
10   True   True  False   True  False  False   True           High

Upvotes: 3

Related Questions