Reputation: 541
I have a dataframe where I need to map categories based on value-based conditions on two separate columns. Total rows to do this are about a million.
Sample dataframe is:
df = pd.DataFrame({'col1':['B','A','A','B','C','B','C','C','A'],
'col2':[10,30,40,20,60,30,70,80,50]})
Now, the conditions for True are:
If the value in col2 are as per above condition, then the result is True(1), else False(0).
Expected outcome is:
col1 col2 result
0 B 10 0
1 A 30 0
2 A 40 1
3 B 20 1
4 C 60 0
5 B 30 1
6 C 70 1
7 C 80 1
8 A 50 1
Upvotes: 1
Views: 234
Reputation: 863226
You can chain masks by |
for bitwise OR
:
df['result'] = (df['col1']=='A') & (df['col2']>30) |
(df['col1']=='B') & (df['col2']>10) |
(df['col1']=='C') & (df['col2']>60)
Or:
df['result'] = np.where((df['col1']=='A') & (df['col2']>30) |
(df['col1']=='B') & (df['col2']>10) |
(df['col1']=='C') & (df['col2']>60), 1, 0)
Upvotes: 1
Reputation: 541
I tried to do this as under:
df['result'] = np.select([(df['col1']=='A') & (df['col2']>30),
(df['col1']=='A') & (df['col2']<=30),
(df['col1']=='B') & (df['col2']>10),
(df['col1']=='B') & (df['col2']<=10),
(df['col1']=='C') & (df['col2']>60),
(df['col1']=='C') & (df['col2']<=60),
],
[True,
False,
True,
False,
True,
False
]
)
However, I do not know if this is the optimal way to do this. Other answers are welcome.
Upvotes: 0