Reputation: 622
I am trying to match the names in two columns in the same dataframe, I want to create a function to return True if the name in one column is an acronym of the other even if they contain the same acronym substring.
pd.DataFrame([['Global Workers Company gwc', 'gwc'], ['YTU', 'your team united']] , columns=['Name1','Name2'])
Desired Output:
Name1 Name2 Match
0 Global Workers Company gwc gwc True
1 YTU your team united True
I have creating a lambda function to only get the acronym but haven't been able to do so
t = 'Global Workers Company gwc'
[x[0] for x in t.split()]
['G', 'W', 'C', 'g']
"".join(word[0][0] for word in test1.Name2.str.split()).upper()
Upvotes: 1
Views: 466
Reputation: 15588
I will use a mapper. We will have a lookup dictionary that will transform data to the same type that we can check for equality.
import pandas as pd
#data
df = pd.DataFrame([['Global Workers Company', 'gwc'], ['YTU', 'your team united']] , columns=['Name1','Name2'])
# create a mapper
mapper = {'gwc':'Global Workers Company',
'YTU': 'your team united'}
def replacer(value, mapper=mapper):
'''Takes in value and finds its map,
if not found return original value
'''
return mapper.get(value, value)
# create column checker and assign the equality
df.assign(
checker = lambda column: column['Name1'].map(replacer) == column['Name2'].map(replacer)
)
print(df)
Upvotes: 0
Reputation: 71687
You can use Dataframe.apply function along with axis=1
parameter to apply a custom func
on the dataframe. Then you can use regular expressions to compare the acronym
with the corresponding larger name or phrase.
Try this:
import re
def func(x):
s1 = x["Name1"]
s2 = x["Name2"]
acronym = s1 if len(s1) < len(s2) else s2
fullform = s2 if len(s1) < len(s2) else s1
fmtstr = ""
for a in acronym:
fmtstr += (r"\b" + a + r".*?\b")
if re.search(fmtstr, fullform, flags=re.IGNORECASE):
return True
else:
return False
df["Match"] = df.apply(func, axis=1)
print(df)
Output:
Name1 Name2 Match
0 Global Workers Company gwc gwc True
1 YTU your team united True
Upvotes: 2