Reputation: 149
I have a data frame that look like this
Col1 Col2
0 22 Apple
1 43 Carrot
2 54 Orange
3 74 Spinach
4 14 Cucumber
And I need to add new column with the category "Fruit" , "Vegetable" or "Leaf" I created a list for each category
Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}
And the result should look like this
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
I tried np.where and contains yet both functions give: 'in ' requires string as left operand, not set
Upvotes: 1
Views: 1020
Reputation: 1640
Another approach you can try with a for loop
:
df = pd.DataFrame({'Col1': [22,43,54,74,14], 'Col2': ['Apple','Carrot','Orange','Spinach','Cucumber']})
Fruit = ['Apple','Orange', 'Grape', 'Blueberry', 'Strawberry']
Vegetable = ['Cucumber','Carrot','Broccoli', 'Onion']
Leaf = ['Lettuce', 'Kale', 'Spinach']
mylist = []
for i in df['Col2']:
if i in Fruit:
mylist.append('Fruit')
elif i in Vegetable:
mylist.append('Vegetable')
elif i in Leaf:
mylist.append('Leaf')
df['Category'] = mylist
print(df)
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
Upvotes: 1
Reputation: 18377
That's because you did not create a list, you created a set as your error shows. You can try making the set a list as the argument for the .isin()
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1':[22,43,54,74,14],'Col2':['Apple','Carrot','Orange','Spinach','Cucumber']})
Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}
df['Category'] = np.where(df['Col2'].isin(Fru),'Fruit',
np.where(df['Col2'].isin(Veg),'Vegetable',
np.where(df['Col2'].isin(Leaf),'Leaf')))
print(df)
Output:
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
Upvotes: 2
Reputation: 863226
Use Series.map
with new dictionary d1
:
Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}
d = {'Fruit':Fru, 'Vegetable':Veg,'Leaf':Leaf}
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
df['Category'] = df['Col2'].map(d1)
print (df)
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
Or use numpy.select
:
df['Category'] = np.select([df['Col2'].isin(Fru),df['Col2'].isin(Veg),df['Col2'].isin(Leaf)],
['Fruit','Vegetable','Leaf'])
print (df)
Col1 Col2 Category
0 22 Apple Fruit
1 43 Carrot Vegetable
2 54 Orange Fruit
3 74 Spinach Leaf
4 14 Cucumber Vegetable
Upvotes: 1