Reputation: 113
There is a slight variant to this question Check if column value is in other columns in pandas
I have a dataframe called test
name_0 name_1 overall_name
Asda Nan Tesco
Asda Nan ASDA
LIDL 1 Asda Lidl
AAA Asda ASDA
AAA Asda ASDA
Sainsbury Nan Lidl
How do I check whether test.overall_name
is in any of the other columns ['name_0', 'name_1' etc]
ignoring the size of characters (lowercase/uppercase) and any special characters.
So my ideal dataframe should look like:
name_0 name_1 overall_name namematch
Asda Nan Tesco no match
Asda Nan ASDA match
LIDL 1 Asda Lidl match
AAA Asda ASDA match
AAA Asda ASDA match
Sainsbury Nan Lidl no match
Upvotes: 1
Views: 617
Reputation: 1049
Re-create the sample data frame:
df=pd.DataFrame({'name_0':['Asda','AS-DA','Asda','LIDL1','AAA','AAA','Sainsbury'],
'name_1':[np.nan,np.nan,'Asda','As da','Asda','Asda',np.nan],
'overall_name':['Tesco','ASDA','Lidl1','ASDA','ASDA','Lid1','As da']})
Convert float to string:
df=df.fillna('nan')
Remove special characters: '-' and ' ', note: need to import 'regex' lib
import re
df = df.applymap(lambda x: re.sub(r'-','', x))
df=df.applymap(lambda x: re.sub(r' ','',x))
Create a list:
name_0=df['name_0'].tolist()
name_1=df['name_1'].tolist()
name_concat=name_0+name_1
Obtain results:
df['namematch']=df['overall_name'].str.lower().isin([x.lower() for x in name_concat])
df['namematch']=np.where(df['namematch']==True,'match','nomatch')
Upvotes: 0
Reputation: 26676
IICU
Make df have a common case. Then use boolean indexing combined with np.where to check and attribute
Using boolean indexing combined with np.where check and attribute
Dataframe Used
df["namematch "] = np.where((df.drop("overall_name", 1).apply(lambda x:x.str.lower())).isin(df["overall_name"].str.lower()).any(1),'match','nomatch')
#Make df have a common case. In this situation made it lower case
#df=df.apply(lambda x:x.str.lower())
# df["namematch "] = np.where(df.drop("overall_name", 1).isin(df["overall_name"]).any(1),'match','nomatch')
Outcome
Upvotes: 0
Reputation: 612
Check this out:
This method converts and compares the values:
import pandas as pd
import re
def match (first, second, overall):
f = re.sub(r"[^a-zA-Z]"," ", first.lower()).strip()
s = re.sub(r"[^a-zA-Z]"," ", second.lower()).strip()
o = re.sub(r"[^a-zA-Z]"," ", overal.lower()).strip()
if f == o:
return 1
elif s == o:
return 1
else:
return 0
This line of code adds match column and applys the function to each row:
df['match'] = df.apply(lambda x: match(x['name_0'],x['name_1'],x['overall_name']),axis=1)
the result is something like this:
name_0 name_1 overall_name match
0 Asda Nan Tesco 0
1 Asda Nan ASDA 1
2 LIDL 1 Asda Lidl 1
3 AAA Asda ASDA 1
4 AAA Asda ASDA 1
5 Sainsbury Nan Lidl 0
let me know if it works for you.
Upvotes: 1