SMO
SMO

Reputation: 149

If column contain string from array of strings create new column with

I have a data frame that look like this

     Col1     Col2    
0     22     Apple
1     43     Carrot 
2     54     Orange
3     74     Spinach
4     14     Cucumber 

And I need to add new column with the category "Fruit" , "Vegetable" or "Leaf" I created a list for each category

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

And the result should look like this

    Col1      Col2     Category 
0     22     Apple      Fruit
1     43     Carrot     Vegetable 
2     54     Orange     Fruit
3     74     Spinach    Leaf
4     14     Cucumber   Vegetable

I tried np.where and contains yet both functions give: 'in ' requires string as left operand, not set

Upvotes: 1

Views: 1020

Answers (3)

ManojK
ManojK

Reputation: 1640

Another approach you can try with a for loop:

df = pd.DataFrame({'Col1': [22,43,54,74,14], 'Col2': ['Apple','Carrot','Orange','Spinach','Cucumber']})

Fruit = ['Apple','Orange', 'Grape', 'Blueberry', 'Strawberry']
Vegetable = ['Cucumber','Carrot','Broccoli', 'Onion']
Leaf = ['Lettuce', 'Kale', 'Spinach']

mylist = []
for i in df['Col2']:
    if i in Fruit:
        mylist.append('Fruit')
    elif i in Vegetable:
        mylist.append('Vegetable')
    elif i in Leaf:
        mylist.append('Leaf')

df['Category'] = mylist

print(df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

Upvotes: 1

Celius Stingher
Celius Stingher

Reputation: 18377

That's because you did not create a list, you created a set as your error shows. You can try making the set a list as the argument for the .isin():

import pandas as pd
import numpy as np
df = pd.DataFrame({'Col1':[22,43,54,74,14],'Col2':['Apple','Carrot','Orange','Spinach','Cucumber']})

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

df['Category'] = np.where(df['Col2'].isin(Fru),'Fruit',
  np.where(df['Col2'].isin(Veg),'Vegetable',
  np.where(df['Col2'].isin(Leaf),'Leaf')))
print(df)

Output:

  Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

Upvotes: 2

jezrael
jezrael

Reputation: 863226

Use Series.map with new dictionary d1:

Fru = {'Apple','Orange', 'Grape', 'Blueberry', 'Strawberry'}
Veg = {'Cucumber','Carrot','Broccoli', 'Onion'}
Leaf = {'Lettuce', 'Kale', 'Spinach'}

d = {'Fruit':Fru, 'Vegetable':Veg,'Leaf':Leaf}

#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)

df['Category'] = df['Col2'].map(d1)
print (df)
   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

Or use numpy.select:

df['Category'] = np.select([df['Col2'].isin(Fru),df['Col2'].isin(Veg),df['Col2'].isin(Leaf)],
                           ['Fruit','Vegetable','Leaf'])
print (df)

   Col1      Col2   Category
0    22     Apple      Fruit
1    43    Carrot  Vegetable
2    54    Orange      Fruit
3    74   Spinach       Leaf
4    14  Cucumber  Vegetable

Upvotes: 1

Related Questions