John
John

Reputation: 57

Creating a column in pandas dataframe based off another column

I have a pandas dataframe where one column contains only strings.

df= pd.DataFrame(
  {
    "A": [2,4,7,17,39], 
    "B": ["apple","apple","broccoli","rose","apple"]
  }
)

I want to examine this column "B" and look for all the times a string contains a certain part of a word. Then I create a new column "C" that contains a series of strings that say "fruit" whenever "app" is in the row,"flower" for whenever "ros" shows up and vegetable for whenever "brocc" shows up.

The final dataframe will look like:

df= pd.DataFrame(
  {
    "A": [2,4,7,17,39], 
    "B": ["apple","apple","broccoli","rose","apple"], 
    "C": ["fruit","fruit", "vegetable", "flower", "fruit"]
  }
)

Upvotes: 2

Views: 268

Answers (2)

MSeifert
MSeifert

Reputation: 152647

You could use a dictionary as converter an it's get method as input for apply:

converter = {'apple': 'fruit',
             'broccoli': 'veg',
             'rose': 'flower'}

df['C'] = df['B'].apply(converter.get)
print(df)
    A         B       C
0   2     apple   fruit
1   4     apple   fruit
2   7  broccoli     veg
3  17      rose  flower
4  39     apple   fruit

In case of the partial matching you would need to change this a little bit:

converter = {'app': 'fruit',
             'brocc': 'vegetable',
             'ros': 'flower'}

df['C'] = df['B'].apply(lambda original: next(val for key, val in converter.items() if key in original))
print(df)
    A         B          C
0   2     apple      fruit
1   4     apple      fruit
2   7  broccoli  vegetable
3  17      rose     flower
4  39     apple      fruit

The next(val for key, val in converter.items() if key in original) will give you the value from the dictionary for the first found key that is in the "row".

Upvotes: 2

user2285236
user2285236

Reputation:

Create a dictionary

d = {'apple': 'fruit', 'broccoli': 'vegetable', 'rose': 'flower'}

And use it in map or replace (map is faster for this):

df['B'].map(d)
Out: 
0        fruit
1        fruit
2    vegetable
3       flower
4        fruit
Name: B, dtype: object

df['B'].replace(d)
Out: 
0        fruit
1        fruit
2    vegetable
3       flower
4        fruit
Name: B, dtype: object

Upvotes: 3

Related Questions