Supratim Haldar
Supratim Haldar

Reputation: 2426

Comparing names in different formats using Python

I want to compare names which are in different formats, eg: "George W. Bush", "George Bush", "George Walker Bush", "Bush, George Walker", "Bush, GW", "Bush, George" etc. There are few with dots (".") as well, but I omitted those from the list because I will normalize those anyways. In fact, the commas (",") will be stripped as well.

What is the best and optimized approach to determine if any 2 given names actually represent the same person? I have thought of using nameparser and build a comparison algorithm, but please provide any other possible options. Any approach using standard modules of Python will be fine too.

Upvotes: 1

Views: 820

Answers (2)

Neo
Neo

Reputation: 56

There's an open source library which can be useful, or at least can be used as base to build more functionalities.

https://github.com/rliebz/whoswho

Sample usage:

>>> from whoswho import who
>>> who.match('Bush, G.W.', 'George W. Bush')

Upvotes: 1

olinox14
olinox14

Reputation: 6663

The most accurate way of doing this is to use an NLP library, like spacy. It would allow you to compute the similarities between words.

If you want a simpler way of doing this, you may implement a simple algo, something like:

def norm(name):
    return sorted(name.lower().replace('.', ''))

Then measure the difference between the two resulting strings...

But this obviously won't give an absolute result.

Upvotes: 1

Related Questions