Reputation: 23
My problem is little bit different from simple word similarty.The question is that is there any algorithm to use for calculating similarty between mail adress and name.
for example:
mail [email protected]
Name Abdullah temel
levenstein,hamming distance 11
jaro distance 0.52
but most likely, this mail address belongs to this name.
Upvotes: 2
Views: 70
Reputation: 130
Fuzzywuzzy can help you with the required solution. First remove '@'and domain name from the string using regex. You will be having 2 string as follows afterwards -
from fuzzywuzzy import fuzz as fz
str1 = "Abd_tml_1132"
str2 = "Abdullah temel"
count_ratio = fz.ratio(str1,str2)
print(count_ratio)
Output -
46
Upvotes: 0
Reputation: 4100
No Direct package but this can solve your problem:
Making email id into list
a = '[email protected]'
rest = a.split('@', 1)[0] # Removing @
result = ''.join([i for i in rest if not i.isdigit()]) ## Removing digits as no names contains digits in them
list_of_email_words =result.split('_') # making a list of all the words. The separator can be changed from _ or . w.r.t to email id
list_of_email_words = list(filter(None, list_of_email_words )) # remove any blank values
Making Name to a list:
b = 'Abdullah temel'
list_of_name_words =b.split(' ')
Apply fuzzy match to both lists:
score =[]
for i in range(len(list_of_email_words)):
for j in range(len(list_of_name_words)):
d = fuzz.partial_ratio(list_of_email_words[i],list_of_name_words[j])
score.append(d)
Now you just need to check if any of the elements of score
is greater than a threshold which can be defined by you. For example:
threshold = 70
if any(x>threshold for x in score):
print ("matched")
Upvotes: 1