Reputation: 1003
I have 2 columns; Name and BrandName. The first column contains values like the name of the products, whereas the second one is the product's brand name. There are some occasions where they have similar names - for example, Coca Cola.
In the image above you can see exactly what I am talking about. I will leave a piece of code for testing purposes:
data = [['Coca Cola', 'Coca Cola'], ['Coca Cola cherry', 'Coca Cola'], ['Coca Cola life', 'Coca Cola'], ['Coca Cola Life', 'Coca Cola']]
testdf = pd.DataFrame(data, columns = ['Name', 'BrandName'])
What I am trying to do is set every value from column BrandName to "Unknown" whenever the product's name contains the same words as the brand name. Take the image above, for example. I was only able to set the first row's Brand name to "Unknown" using the code below:
testdf["BrandName"] = np.where(testdf["Name"] == testdf["BrandName"], "Unknown", testdf["BrandName"])
However, I didn't figure out yet how to set the condition in a way so when Name contains the whole content of column BrandName, the latter's value becomes "Unknown".
Upvotes: 1
Views: 46
Reputation: 88285
Here's one way using mask
an apply
:
testdf['BrandName'] = (testdf.BrandName.mask(testdf.apply(
lambda x: x.BrandName in x.Name, axis=1),
'Unkown'))
Name BrandName
0 Coca Cola Unkown
1 Coca Cola cherry Unkown
2 Coca Cola life Unkown
3 Coca Cola Life Unkown
Upvotes: 1
Reputation: 863301
You can pass custom lambda function from this to np.where
:
import re
f = lambda x: bool(re.search(r'\b{}\b'.format(x['BrandName']), x['Name']))
testdf['match'] = np.where(testdf.apply(f, axis = 1), "Unknown", testdf["BrandName"])
print (testdf)
Name BrandName match
0 Coca Cola Coca Cola Unknown
1 Coca Cola cherry Coca Cola Unknown
2 Coca Cola life Coca Cola Unknown
3 Coca Cola Life Coca Cola Unknown
Upvotes: 1