akbiggs
akbiggs

Reputation: 35171

How do I check for if an exact string exists in another string?

I'm currently running into a bit of a problem. I'm trying to write a program that will highlight occurrences of a word or phrase inside of another string, but only if the string it's being matched to is exactly the same. The part I'm running into troubles with is identifying whether or not the subphrase I'm matching the phrase with is contained within another larger subphrase.

A quick example which shows this problem:

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> indicators_in_phrase = [indicator for indicator in indicators 
                            if indicator in phrase.lower()]
>>> print indicators_in_phrase
['therefore', 'for']

I do not want 'for' included in that list. I know why it is being included, but I can't think of any expression that could filter out substrings like that.

I've noticed other similar questions on the site, but each involves a Regex solution, which is something I'm not feeling comfortable with yet, especially not in Python. Is there any kind-of-easy way to solve this problem without using a Regex expression? If not, the corresponding Regex expression and how it might be implemented in the above example would be very much appreciated.

Upvotes: 2

Views: 4048

Answers (8)

theReverseFlick
theReverseFlick

Reputation: 6044

  1. Create set of indicators
  2. Create set of phrases
  3. Find intersection

Code:

indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."
print list(set(indicators).intersection(set( [ each.strip('.,') for each in phrase.split(' ')])))

Cheers:)

Upvotes: 1

rubik
rubik

Reputation: 9104

The regex are the simplest way! Hint:

re.compile(r'\btherefore\b')

Then you can change the word in the middle!

EDIT: I wrote this for you:

import re

indicators = ["therefore", "for", "since"]

phrase = "... therefore, I conclude I am awesome. "

def find(phrase, indicators):
    def _match(i):
        return re.compile(r'\b%s\b' % (i)).search(phrase)
    return [ind for ind in indicators if _match(ind)]

>>> find(phrase, indicators)
['therefore']

Upvotes: 2

Paulo Scardine
Paulo Scardine

Reputation: 77251

It is one line with regex...

import re

indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."

indicators_in_phrase = set(re.findall(r'\b(%s)\b' % '|'.join(indicators), phrase.lower()))

Upvotes: 2

ghostdog74
ghostdog74

Reputation: 342333

You can strip off punctuations from your phrase, then do split on it so that all words are individual. Then you can do your string comparison

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
['therefore', 'I', 'conclude', 'I', 'am', 'awesome']
>>> p = ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in p ]
>>> indicators_in_phrase
['therefore']

Upvotes: 0

Francis Potter
Francis Potter

Reputation: 1639

Is the problem with "for" that it's inside "therefore" or that it's not a word? For example, if one of your indicators was "awe", would you want it to be included in indicators_in_phrase?

How would you want the following situation to be handled? indicators = ["abc", "cde"] phrase = "One abcde two"

Upvotes: 0

jgritty
jgritty

Reputation: 11915

I think what you are trying to do is something more like this:

import string

words_in_phrase = string.split(phrase)

Now you'll have the words in a list like this:

['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']

Then compare the lists like so:

indicators_in_phrase = []
for word in words_in_phrase:
  if word in indicators:
    indicators_in_phrase.append(word)

There's probably several ways to make this less verbose, but I prefer clarity. Also, you might have to think about removing punctuation as in "awesome." and "therefore,"

For that use rstrip as in the other answer

Upvotes: 1

pyfunc
pyfunc

Reputation: 66709

A little lengthy but gives an idea / of course regex is there to make it simple

>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> phrase_list = phrase.split()
>>> phrase_list
['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']
>>> phrase_list = [ k.rstrip(',') for k in phrase_list]
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in phrase_list]
>>> indicators_in_phrase 
['therefore']

Upvotes: 0

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798606

There are ways to do it without a regex, but most of those ways are so convoluted that you'll wish you had spent the time learning the simple regex sequence that you need for it.

Upvotes: 5

Related Questions