ngrams results are surprising python

Question

Just trying to play with the ngram library of the Python and I came across an issue which is related to the similarity of the string. The ratio output was a bit confusing. See what I tried:

>>> ngram.NGram.compare('alexp','Alex Cho',N=1)*100
30.0
>>>
>>> ngram.NGram.compare('alexp','Alex Plutzer',N=1)*100
21.428571428571427
>>> ngram.NGram.compare('alexp','Alex Plutzer'.lower(),N=1)*100
41.66666666666667
>>> ngram.NGram.compare('alexp','Alex Cho'.lower(),N=1)*100
44.44444444444444
>>> ngram.NGram.compare('alexp','AlexCho'.lower(),N=1)*100
50.0
>>> ngram.NGram.compare('alexp','AlexPlutzer'.lower(),N=1)*100
45.45454545454545

The most similar must be the one having alexp i.e. Alex Plutzer but the more score is getting assigned to the former one i.e. Alex Cho
What might be done to get an appropriate result, where I get to have the output as Alex Plutzer with high score as compare to the competitive one?

ngrams results are surprising python

Answers (1)

Related Questions