marvinrae
marvinrae

Reputation: 5

How do I search for a keyword in a string, extract that string, and place it in a new column?

I'm using Pandas. Here's my df:

df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']}

I'd like to search each string value and extract just the product category and then put that extracted string value in another column ("Category"). As you may notice, the product names do not have a formal naming convention so .split() would not be ideal to use.

The end result should look like this:

df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5'], 'Category': ['Pegasus', 'Pegasus', 'Metcon', 'Metcon]}

My current code is this, but i'm getting an error:

def get_category(product):
if df['Product Name'].str.contains('Pegasus') or df['Product Name'].str.contains('Metcon'):
    return product

df['Category'] = df['Product Name'].apply(lambda x: get_category(x))

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Hope you can help. Thanks!

Upvotes: 0

Views: 72

Answers (4)

sushanth
sushanth

Reputation: 8302

using pandas.Series.str.extract

>>> df = pd.DataFrame({'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']})
>>> cats = ["Pegasus","Metcon"]

>>> df['Category'] = df["Product Name"].str.extract("(%s)" % "|".join(cats))

                  Product Name Category
0            Nike Zoom Pegasus  Pegasus
1  All New Nike Zoom Pegasus 4  Pegasus
2                     Metcon 3   Metcon
3                Nike Metcon 5   Metcon

Upvotes: 0

Bernardo Trindade
Bernardo Trindade

Reputation: 481

How about:

import pandas as pd

df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']}

c = set(['Metcon', 'Pegasus'])
categories = [c.intersection(pn.split(' ')) for pn in df['Product Name']]
df['Categories'] = categories

print(df)
>> {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5'], 'Categories': [{'Pegasus'}, {'Pegasus'}, {'Metcon'}, {'Metcon'}]}

Upvotes: 0

Rajith Thennakoon
Rajith Thennakoon

Reputation: 4130

How about this solution,When you have a new category all you have to do add new category to cats array.

import pandas as pd
import numpy as np

df = pd.DataFrame({'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']})
cats = ["Pegasus","Metcon"]
df["Category"] = df["Product Name"].apply(lambda x: np.intersect1d(x.split(" "),cats)[0])


output
                  Product Name Category
0            Nike Zoom Pegasus  Pegasus
1  All New Nike Zoom Pegasus 4  Pegasus
2                     Metcon 3   Metcon
3                Nike Metcon 5   Metcon

Upvotes: 1

pecey
pecey

Reputation: 681

The problems with your code are as follows:

  • You are passing the product, but while checking you are using df["Product Name"], which returns the entire series.
  • Also, the return value is product. But according to the expected answer, it would be either Pegasus or Metcon

I think you want something like this.

def get_category(product):
    if "Pegasus" in product:
        return "Pegasus" 
    elif "Metcon" in product:
        return "Metcon"

Upvotes: 0

Related Questions