String search by coincidence?

I just wanted to know if there's a simple way to search a string by coincidence with another one in Python. Or if anyone knows how it could be done.

To make myself clear I'll do an example.

text_sample = "baguette is a french word"
words_to_match = ("baguete","wrd")

letters_to_match = ('b','a','g','u','t','e','w','r','d')   #   With just one 'e'
coincidences = sum(text_sample.count(x) for x in letters_to_match)

#    coincidences = 14 Current output
#    coincidences = 10 Expected output

My current method breaks the words_to_match into single characters as in letters_to_match but then it is matched as follows: "baguette is a french word" (coincidences = 14).

But I want to obtain (coincidences = 10) where "baguette is a french word" were counted as coincidences. By checking the similarity between words_to_match and the words in text_sample.

How do I get my expected output?

Upvotes: 1

Answers (2)

jp_

Reputation: 87

first, split words_to_match with

    words = ''
    for item in words_to_match:
        words += item
    letters = [] # create a list
    for letter in words:
        letters.append(letter)
    letters = tuple(letters)

then, see if its in it

    x = 0
    for i in sample_text:
        if letters[x] == i:
            x += 1
            coincidence += 1

also if it's not in sequence just do:

    for i in sample_text:
        if i in letters: coincidence += 1

(note that some versions of python you'l need a newline)

Upvotes: 1

pts

Reputation: 87271

It looks like you need the length of the longest common subsequence (LCS). See the algorithm in the Wikipedia article for computing it. You may also be able to find a C extension which computes it quickly. For example, this search has many results, including pylcs. After installation (pip install pylcs):

import pylcs
text_sample = "baguette is a french word"
words_to_match = ("baguete","wrd")
print(pylcs.lcs2(text_sample, ' '.join(words_to_match.join)))  #: 14

Upvotes: 0

String search by coincidence?

Answers (2)

Related Questions