word similarty between mail adresses and names

Question

My problem is little bit different from simple word similarty.The question is that is there any algorithm to use for calculating similarty between mail adress and name.

    for example:
    mail Abd_tml_1132@gmail.com
    Name Abdullah temel
    levenstein,hamming distance  11
    jaro distance  0.52

but most likely, this mail address belongs to this name.

Rahul Agarwal · Accepted Answer

No Direct package but this can solve your problem:

Making email id into list

a = 'Abd_tml_1132@gmail.com'
rest = a.split('@', 1)[0] # Removing @
result = ''.join([i for i in rest if not i.isdigit()]) ## Removing digits as no names contains digits in them
list_of_email_words =result.split('_') # making a list of all the words. The separator can be changed from _ or . w.r.t to email id
list_of_email_words = list(filter(None, list_of_email_words )) # remove any blank values

Making Name to a list:

b = 'Abdullah temel'
list_of_name_words =b.split(' ')

Apply fuzzy match to both lists:

score =[]
for i in range(len(list_of_email_words)):
    for j in range(len(list_of_name_words)):
        d = fuzz.partial_ratio(list_of_email_words[i],list_of_name_words[j])
        score.append(d)

Now you just need to check if any of the elements of score is greater than a threshold which can be defined by you. For example:

threshold = 70
if any(x>threshold for x in score):
    print ("matched")

word similarty between mail adresses and names

Answers (2)

Related Questions