Reputation: 55
I have a dataframe
vehicle_make vehicle_model vehicle_year
Toyota Corolla 2016
Hyundai Sonata 2016
Cadillac DTS 2006
Toyota Prius 2014
Kia Optima 2015
I want to add a new column 'vehicle_make_category' which populates based on a list i have
luxury=['Bentley',
'Maserati',
'Hummer',
'Porsche',
'Lexus']
non_luxury=['Saab',
'Mazda',
'Dodge',
'Volkswagen',
'Kia',
'Chevrolet',
'Hyundai',
'Ford',
'Nissan',
'Honda',
'Toyota'
]
How can accomplish this? I have tried using
df['vehicle_make_category']=np.where(df['vehicle_make']=i for i in luxury, 'luxury')
but it doesnt work...
Upvotes: 2
Views: 3639
Reputation: 8631
You can create a lookup_df
from the lists for non_luxury
and luxury
.
lookup_df = pd.DataFrame({
'vehicle_make': luxury + non_luxury,
'vehicl_make_category': (["luxury"] * len(luxury))+(["non_luxury"] * len(non_luxury))
})
Then left join
on the original df
that you have.
df.merge(lookup_df, how='left',left_on='vehicle_make', right_on='vehicle_make')
Output:
vehicle_make vehicle_model vehicle_year vehicle_make_category
0 Toyota Corolla 2016 non_luxury
1 Hyundai Sonata 2016 non_luxury
2 Cadillac DTS 2006 NaN
3 Toyota Prius 2014 non_luxury
4 Kia Optima 2015 non_luxury
Upvotes: 0
Reputation: 911
You can create the column via list comprehension:
df['vehicle_make_category'] = [
'luxury' if row.vehicle_make in luxury
else 'non_luxury'
for _, row in df.iterrows()
]
Upvotes: 0
Reputation: 9264
Use isin
and also add a condition to np.where
that fills the gaps for a condition not evaluated as true
df['vehicle_make_category'] = np.where(df.vehicle_make.isin(luxury),'luxury','non-luxury')
vehicle_make vehicle_model vehicle_year vehicle_make_category
0 Toyota Corolla 2016 non-luxury
1 Hyundai Sonata 2016 non-luxury
2 Cadillac DTS 2006 non-luxury
3 Toyota Prius 2014 non-luxury
4 Kia Optima 2015 non-luxury
Using np.select
we can create a conditions list and assign values based on a condition being true
conditions = [df.vehicle_make.isin(luxury),df.vehicle_make.isin(non_luxury)]
df['vehicle_make_category'] = np.select(conditions,['luxury','non-luxury'],default='no-category')
vehicle_make vehicle_model vehicle_year vehicle_make_category
0 Toyota Corolla 2016 non-luxury
1 Hyundai Sonata 2016 non-luxury
2 Cadillac DTS 2006 no-category
3 Toyota Prius 2014 non-luxury
4 Kia Optima 2015 non-luxury
Upvotes: 1
Reputation: 59274
Simply
df["vehicle_make_category"] = None
df.loc[df["vehicle_make"].isin(luxury), "vehicle_make_category"] = "luxury"
df.loc[df["vehicle_make"].isin(non_luxury), "vehicle_make_category"] = "non_luxury"
Upvotes: 3
Reputation: 1980
You can us df.join
You'll have to make a new dataframe identifying luxury/nonluxury.
veh = ['toyota','hyundai','cadillac']
yr = [2016,2016,2016]
lux = ['non','non','lux']
#recreating your lux/non layout
n_lux = [veh[0],veh[1]]
lux = [veh[2]]
#then making a new column
b = ['non' if v in n_lux else 'lux' for v in veh]
A = pd.DataFrame(np.array([veh,yr]).T)
B =pd.DataFrame(np.array([veh,b]).T)
pd.concat([A,B],axis = 1, keys = [0])
Upvotes: 0