Programmatically figuring out if translated names are equivalent

Question

I'm trying to see if two translated names are equivalent. Sometimes the translation will have the names ordered differently. For example:

>>> import difflib
>>> a = 'Yuk-shing Au'
>>> b = 'Au Yuk Sing'
>>> seq=difflib.SequenceMatcher(a=a.lower(), b=b.lower())
>>> seq.ratio()
0.6086956521739131

'Yuk-Shing Au' and 'Au Yuk Sing' are the same person. Is there a way to detect something like this, such that the ratio for names like this will be much higher? Similar to the result for:

>>> a = 'Yuk-shing Au'
>>> b = 'Yuk Sing Au'
>>> seq=difflib.SequenceMatcher(a=a.lower(), b=b.lower())
>>> seq.ratio()
0.8181818181818182

orlp · Accepted Answer

You can normalize the ordering of names before comparing:

def normalize(name):
    name_parts = name.replace("-", " ").split()
    return " ".join(sorted(name_parts)).lower()

Programmatically figuring out if translated names are equivalent

Answers (1)

Related Questions