Hernandez
Hernandez

Reputation: 11

How to check if string is similar to any string in list

I am trying to figure out if my string is similar to any list of items in list. My error is that it only iterates up to my list size, not for the length of my singular string. Any suggestions?

my_string = "aplpe"
my_list=["orange", "apple", "grape"]
correctamount=0
    if(my_string in my_list):
        print("Passed")
    else:
        if any(my_string in s for s in my_list):
            for i in range(len(my_string) + 1):
                if my_string[i] == my_list[i][i]:
                    correctamount += 1
                    print(correctamount)
                else:
                    correctamount == 0
                    print(correctamount)

            if((correctamount/len(my_list) + 1 ) > .75):
                print("Passed")
            else:
                print("Failure")

Upvotes: 1

Views: 2975

Answers (3)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

To find similarity between strings there are many kinds of algorithms, Python has a library called textdistance which has all the algorithms.

The one I am going to use is Jaccard distance according to your requirements. You need to decide on the algorithm based on your needs.

import textdistance as td

similarity_perc = [td.jaccard.normalized_similarity(my_string, s) for s in my_list]

Similarity percentage for each string

[0.22, 1.0, 0.42]

Get the index of the most similar string

most_similar_index = similarity_perc.index(max(similarity_perc))
# Omitted not found check. Please do it yourself.
print(my_list[most_similar_index])

Output

apple

A benchmark of textdistance with other libraries is given here if you are looking to use this for a large dataset.

Upvotes: 2

bigbounty
bigbounty

Reputation: 17408

There's a library called jellyfish for this purpose - https://github.com/jamesturk/jellyfish

>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
2
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
0.89629629629629637
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
1

The library has different algorithms of string matching

Levenshtein Distance
Damerau-Levenshtein Distance
Jaro Distance
Jaro-Winkler Distance
Match Rating Approach Comparison
Hamming Distance

Upvotes: 3

Deepak Tatyaji Ahire
Deepak Tatyaji Ahire

Reputation: 5311

I believe in writing a clean code and separating out individual functionality, so that the code is easy to read and to contribute.

I defined a function is_similar to check the similarity_percentage.

Have a look at the following implementation:

import math

def is_similar(my_string, test_string):
    min_len = min(len(my_string), len(test_string))
    count = 0
    for i in range(0, min_len):
        if(my_string[i] == test_string[i]):
            count = count+1
    similarity_percentage = count/len(my_string)
    print("Similarity Precentage: ", similarity_percentage)
    return ( similarity_percentage > 0.75 )

my_string = "aplpe"
my_list=["orange", "apple", "grape"]

if(my_string in my_list):
    print("Passed - Identical")
else:
    for i in range(0, len(my_list)):
        if(is_similar(my_string, my_list[i])):
            print("Passed - Similar with", my_list[i], sep=" ")
        else:
            print("Failure")

Output:

Similarity Precentage:  0.0
Failure
Similarity Precentage:  0.6
Failure
Similarity Precentage:  0.4
Failure

Case 2:

If

my_string = "aplpe"
my_list=["orange", "apppe", "grape"]

Then, output:

Similarity Precentage:  0.0
Failure
Similarity Precentage:  0.8
Passed - Similar with apppe
Similarity Precentage:  0.4
Failure

Case 3:

If

my_string = "aplpe"
my_list=["orange", "aplpe", "grape"]

Then, output:

Passed - Identical

Upvotes: 0

Related Questions