Veki
Veki

Reputation: 541

Conditional mapping to a dataframe based on multiple columns

I have a dataframe where I need to map categories based on value-based conditions on two separate columns. Total rows to do this are about a million.

Sample dataframe is:

df = pd.DataFrame({'col1':['B','A','A','B','C','B','C','C','A'],
               'col2':[10,30,40,20,60,30,70,80,50]})

Now, the conditions for True are:

  1. A: >30
  2. B: >20
  3. C: >60

If the value in col2 are as per above condition, then the result is True(1), else False(0).

Expected outcome is:

    col1    col2    result
0   B   10  0
1   A   30  0
2   A   40  1
3   B   20  1
4   C   60  0
5   B   30  1
6   C   70  1
7   C   80  1
8   A   50  1

Upvotes: 1

Views: 234

Answers (2)

jezrael
jezrael

Reputation: 863226

You can chain masks by | for bitwise OR:

df['result'] = (df['col1']=='A') & (df['col2']>30) |
               (df['col1']=='B') & (df['col2']>10) | 
               (df['col1']=='C') & (df['col2']>60)

Or:

df['result'] = np.where((df['col1']=='A') & (df['col2']>30) |
                        (df['col1']=='B') & (df['col2']>10) | 
                        (df['col1']=='C') & (df['col2']>60), 1, 0)

Upvotes: 1

Veki
Veki

Reputation: 541

I tried to do this as under:

df['result'] = np.select([(df['col1']=='A') & (df['col2']>30),
                      (df['col1']=='A') & (df['col2']<=30),
                      (df['col1']=='B') & (df['col2']>10),
                      (df['col1']=='B') & (df['col2']<=10),
                      (df['col1']=='C') & (df['col2']>60),
                      (df['col1']=='C') & (df['col2']<=60),
                     ],
                     [True,
                      False,
                      True,
                      False,
                      True,
                      False
                     ]
                    )

However, I do not know if this is the optimal way to do this. Other answers are welcome.

Upvotes: 0

Related Questions