Reputation: 1219
I was reading a blog for conditaion based new computations where new col 'category' is inserted.
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
'age': [42, 52, 36, 24, 73],
'preTestScore': [4, 24, 31, 2, 3],
'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
df['category'] = np.where(df['age']>=50, 'yes', 'no')
how it can be extend to more that multiple conditions like if age is less than 20 then kid ; if between 21 and 40 then young ; if above 40 then old
Upvotes: 2
Views: 491
Reputation: 323316
You can using pd.cut
(BTW , 40 is not old man :-()
pd.cut(df.age,bins=[0,20,39,np.inf],labels=['kid','young','old'])
Out[179]:
0 old
1 old
2 young
3 young
4 old
Name: age, dtype: category
Categories (3, object): [kid < young < old]
Upvotes: 1
Reputation: 59579
For multiple conditions, you can just use numpy.select
instead of numpy.where
import numpy as np
cond = [df['age'] < 20, df['age'].between(20, 39), df['age'] >= 40]
choice = ['kid', 'young', 'old']
df['category'] = np.select(cond, choice)
# name age preTestScore postTestScore category
#0 Jason 42 4 25 old
#1 Molly 52 24 94 old
#2 Tina 36 31 57 young
#3 Jake 24 2 62 young
#4 Amy 73 3 70 old
Upvotes: 5