chetan reddy
chetan reddy

Reputation: 320

similarity measurement among names?

I have a list of names with me and iam trying to find the most similar 5 names from the list of any given name as a query. I thought of applying word2vec or else using Text.similar() from nltk. but iam not sure whether these will work for names as well.

any similarity measure would work for me. any suggestions? this not for any project but just i wanted to learn new things.

Upvotes: 0

Views: 1844

Answers (1)

Aditya Mukherji
Aditya Mukherji

Reputation: 9256

Since you added NLTK, I assume you are fine working in Python.
Check out the Jellyfish library which contains 10 different algorithms for comparing strings. Some of them will compare just the characters while others will try to guess how a string would be pronounced and help you identify other phrases that are very differently spelt but would sound similar.
The actual algorithms are all written in C and so this library is pretty efficient!
I think you will find the Jaro-Winkler distance to be most useful. Also check out this paper.

Upvotes: 4

Related Questions