Akihero3
Akihero3

Reputation: 425

Is there a way to do fuzzy string matching for words on string?

I want to do fuzzy matching on string with words.

The target string could be like. "Hello, I am going to watch a film today."
where the words I want to search are.
"flim toda".

This hopefully should return "film today" as a search result.

I have used this method but it seems to be working only with one word.

import difflib

def matches(large_string, query_string, threshold):
    words = large_string.split()
    matched_words = []
    for word in words:
        s = difflib.SequenceMatcher(None, word, query_string)
        match = ''.join(word[i:i+n] for i, j, n in s.get_matching_blocks() if n)
        if len(match) / float(len(query_string)) >= threshold:
            matched_words.append(match)
    return matched_words
large_string = "Hello, I am going to watch a film today"
query_string = "film"
print(list(matches(large_string, query_string, 0.8)))

This only works with one word and it returns when there is little noise.

Is there any way to do such fuzzy matching with words?

Upvotes: 1

Views: 1029

Answers (2)

Amila Viraj
Amila Viraj

Reputation: 1064

You can simply use Fuzzysearch, please see the example below;

from fuzzysearch import find_near_matches

text_string = "Hello, I am going to watch a film today."
matches = find_near_matches('flim toda', text_string, max_l_dist=2)

print([my_string[m.start:m.end] for m in matches])

This will give you the desired output.

['film toda']

Please note that you can give a value for max_l_dist parameter based on how much you are going to tolerate.

Upvotes: 2

amirouche
amirouche

Reputation: 7873

The feature you are thinking of is called "query suggestion" and does rely on spell checking, but it relies on markov chains built out of search engine query log.

That being said, you use an approach similar to the one described in this answer: https://stackoverflow.com/a/58166648/140837

Upvotes: 1

Related Questions