Slartibartfast
Slartibartfast

Reputation: 1190

matching string in python

The question I have is regarding the identification of a series of string in python. Let me explain what I am trying to do:

A string such as tom and jerry could also be written as in lowercase

  1. tom n jerry
  2. tom_jerry
  3. tom & jerry
  4. tom and jerry

and so on and so forth. As you can see there in the minimal example, there were 4 possible ways where even if I created a dictionary with these 3 ways, i will miss out on a string containing tom _ jerry. What can I do to recognize tom and jerry, creating many rules seems very inefficient. Is there a more efficient way to do this ?

Upvotes: 1

Views: 160

Answers (2)

Michael Gathara
Michael Gathara

Reputation: 66

You could attempt this using a sequence matcher.

from difflib import SequenceMatcher

def checkMatch(firstWord: str, secondWord: str, strictness: float):
    ratio = SequenceMatcher(None, firstWord.strip(), secondWord.strip()).ratio()
    if ratio > strictness:
        return 1
    return 2

if __name__ == "__main__":
    originalWord = "tom and jerry"
    toMatch = "tom_jerry" # chose this one as it is the least likely in your example
    toMatch.lower() # easier to match if you lower or upper both the original and the match
    strictness = 0.6 # a strictness of 0.6 would mean the words are generally pretty similiar
    print(checkMatch(originalWord, toMatch, strictness))

You can learn more about how sequence matcher works here: https://towardsdatascience.com/sequencematcher-in-python-6b1e6f3915fc

Upvotes: 1

OldManSeph
OldManSeph

Reputation: 2699

This will find any of those combinations in a sentence:

combo = "tom n jerry"
string = "This is an episode of" + combo + "that deals with something."
substring = string[string.find("tom"):string.find("jerry")+5]
print(substring)

Upvotes: 2

Related Questions