Reputation: 425
I want to do fuzzy matching on string with words.
The target string could be like.
"Hello, I am going to watch a film today."
where the words I want to search are.
"flim toda".
This hopefully should return "film today" as a search result.
I have used this method but it seems to be working only with one word.
import difflib
def matches(large_string, query_string, threshold):
words = large_string.split()
matched_words = []
for word in words:
s = difflib.SequenceMatcher(None, word, query_string)
match = ''.join(word[i:i+n] for i, j, n in s.get_matching_blocks() if n)
if len(match) / float(len(query_string)) >= threshold:
matched_words.append(match)
return matched_words
large_string = "Hello, I am going to watch a film today"
query_string = "film"
print(list(matches(large_string, query_string, 0.8)))
This only works with one word and it returns when there is little noise.
Is there any way to do such fuzzy matching with words?
Upvotes: 1
Views: 1029
Reputation: 1064
You can simply use Fuzzysearch, please see the example below;
from fuzzysearch import find_near_matches
text_string = "Hello, I am going to watch a film today."
matches = find_near_matches('flim toda', text_string, max_l_dist=2)
print([my_string[m.start:m.end] for m in matches])
This will give you the desired output.
['film toda']
Please note that you can give a value for max_l_dist
parameter based on how much you are going to tolerate.
Upvotes: 2
Reputation: 7873
The feature you are thinking of is called "query suggestion" and does rely on spell checking, but it relies on markov chains built out of search engine query log.
That being said, you use an approach similar to the one described in this answer: https://stackoverflow.com/a/58166648/140837
Upvotes: 1