Replacing values from two columns when they are similar

Question

I have 2 columns; Name and BrandName. The first column contains values like the name of the products, whereas the second one is the product's brand name. There are some occasions where they have similar names - for example, Coca Cola.

In the image above you can see exactly what I am talking about. I will leave a piece of code for testing purposes:

data = [['Coca Cola', 'Coca Cola'], ['Coca Cola cherry', 'Coca Cola'], ['Coca Cola life', 'Coca Cola'], ['Coca Cola Life', 'Coca Cola']]
testdf = pd.DataFrame(data, columns = ['Name', 'BrandName'])

What I am trying to do is set every value from column BrandName to "Unknown" whenever the product's name contains the same words as the brand name. Take the image above, for example. I was only able to set the first row's Brand name to "Unknown" using the code below:

testdf["BrandName"] = np.where(testdf["Name"] == testdf["BrandName"], "Unknown", testdf["BrandName"])

However, I didn't figure out yet how to set the condition in a way so when Name contains the whole content of column BrandName, the latter's value becomes "Unknown".

jezrael · Accepted Answer

You can pass custom lambda function from this to np.where:

import re

f = lambda x: bool(re.search(r'\b{}\b'.format(x['BrandName']), x['Name']))

testdf['match'] = np.where(testdf.apply(f, axis = 1), "Unknown", testdf["BrandName"])
print (testdf)
               Name  BrandName    match
0         Coca Cola  Coca Cola  Unknown
1  Coca Cola cherry  Coca Cola  Unknown
2    Coca Cola life  Coca Cola  Unknown
3    Coca Cola Life  Coca Cola  Unknown

Replacing values from two columns when they are similar

Answers (2)

Related Questions