Reputation: 5
I'm using Pandas. Here's my df:
df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']}
I'd like to search each string value and extract just the product category and then put that extracted string value in another column ("Category"). As you may notice, the product names do not have a formal naming convention so .split() would not be ideal to use.
The end result should look like this:
df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5'], 'Category': ['Pegasus', 'Pegasus', 'Metcon', 'Metcon]}
My current code is this, but i'm getting an error:
def get_category(product):
if df['Product Name'].str.contains('Pegasus') or df['Product Name'].str.contains('Metcon'):
return product
df['Category'] = df['Product Name'].apply(lambda x: get_category(x))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Hope you can help. Thanks!
Upvotes: 0
Views: 72
Reputation: 8302
using pandas.Series.str.extract
>>> df = pd.DataFrame({'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']})
>>> cats = ["Pegasus","Metcon"]
>>> df['Category'] = df["Product Name"].str.extract("(%s)" % "|".join(cats))
Product Name Category
0 Nike Zoom Pegasus Pegasus
1 All New Nike Zoom Pegasus 4 Pegasus
2 Metcon 3 Metcon
3 Nike Metcon 5 Metcon
Upvotes: 0
Reputation: 481
How about:
import pandas as pd
df = {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']}
c = set(['Metcon', 'Pegasus'])
categories = [c.intersection(pn.split(' ')) for pn in df['Product Name']]
df['Categories'] = categories
print(df)
>> {'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5'], 'Categories': [{'Pegasus'}, {'Pegasus'}, {'Metcon'}, {'Metcon'}]}
Upvotes: 0
Reputation: 4130
How about this solution,When you have a new category all you have to do add new category to cats array.
import pandas as pd
import numpy as np
df = pd.DataFrame({'Product Name': ['Nike Zoom Pegasus', 'All New Nike Zoom Pegasus 4', 'Metcon 3', 'Nike Metcon 5']})
cats = ["Pegasus","Metcon"]
df["Category"] = df["Product Name"].apply(lambda x: np.intersect1d(x.split(" "),cats)[0])
output
Product Name Category
0 Nike Zoom Pegasus Pegasus
1 All New Nike Zoom Pegasus 4 Pegasus
2 Metcon 3 Metcon
3 Nike Metcon 5 Metcon
Upvotes: 1
Reputation: 681
The problems with your code are as follows:
df["Product Name"]
, which returns the entire series.Pegasus
or Metcon
I think you want something like this.
def get_category(product):
if "Pegasus" in product:
return "Pegasus"
elif "Metcon" in product:
return "Metcon"
Upvotes: 0