Reputation: 35171
I'm currently running into a bit of a problem. I'm trying to write a program that will highlight occurrences of a word or phrase inside of another string, but only if the string it's being matched to is exactly the same. The part I'm running into troubles with is identifying whether or not the subphrase I'm matching the phrase with is contained within another larger subphrase.
A quick example which shows this problem:
>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> indicators_in_phrase = [indicator for indicator in indicators
if indicator in phrase.lower()]
>>> print indicators_in_phrase
['therefore', 'for']
I do not want 'for' included in that list. I know why it is being included, but I can't think of any expression that could filter out substrings like that.
I've noticed other similar questions on the site, but each involves a Regex solution, which is something I'm not feeling comfortable with yet, especially not in Python. Is there any kind-of-easy way to solve this problem without using a Regex expression? If not, the corresponding Regex expression and how it might be implemented in the above example would be very much appreciated.
Upvotes: 2
Views: 4048
Reputation: 6044
Code:
indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."
print list(set(indicators).intersection(set( [ each.strip('.,') for each in phrase.split(' ')])))
Cheers:)
Upvotes: 1
Reputation: 9104
The regex are the simplest way! Hint:
re.compile(r'\btherefore\b')
Then you can change the word in the middle!
EDIT: I wrote this for you:
import re
indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome. "
def find(phrase, indicators):
def _match(i):
return re.compile(r'\b%s\b' % (i)).search(phrase)
return [ind for ind in indicators if _match(ind)]
>>> find(phrase, indicators)
['therefore']
Upvotes: 2
Reputation: 77251
It is one line with regex...
import re
indicators = ["therefore", "for", "since"]
phrase = "... therefore, I conclude I am awesome."
indicators_in_phrase = set(re.findall(r'\b(%s)\b' % '|'.join(indicators), phrase.lower()))
Upvotes: 2
Reputation: 342333
You can strip off punctuations from your phrase, then do split on it so that all words are individual. Then you can do your string comparison
>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
['therefore', 'I', 'conclude', 'I', 'am', 'awesome']
>>> p = ''.join([ i for i in phrase.lower() if i not in string.punctuation]).strip().split()
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in p ]
>>> indicators_in_phrase
['therefore']
Upvotes: 0
Reputation: 1639
Is the problem with "for" that it's inside "therefore" or that it's not a word? For example, if one of your indicators was "awe", would you want it to be included in indicators_in_phrase?
How would you want the following situation to be handled? indicators = ["abc", "cde"] phrase = "One abcde two"
Upvotes: 0
Reputation: 11915
I think what you are trying to do is something more like this:
import string
words_in_phrase = string.split(phrase)
Now you'll have the words in a list like this:
['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']
Then compare the lists like so:
indicators_in_phrase = []
for word in words_in_phrase:
if word in indicators:
indicators_in_phrase.append(word)
There's probably several ways to make this less verbose, but I prefer clarity. Also, you might have to think about removing punctuation as in "awesome." and "therefore,"
For that use rstrip as in the other answer
Upvotes: 1
Reputation: 66709
A little lengthy but gives an idea / of course regex is there to make it simple
>>> indicators = ["therefore", "for", "since"]
>>> phrase = "... therefore, I conclude I am awesome."
>>> phrase_list = phrase.split()
>>> phrase_list
['...', 'therefore,', 'I', 'conclude', 'I', 'am', 'awesome.']
>>> phrase_list = [ k.rstrip(',') for k in phrase_list]
>>> indicators_in_phrase = [indicator for indicator in indicators if indicator in phrase_list]
>>> indicators_in_phrase
['therefore']
Upvotes: 0
Reputation: 798606
There are ways to do it without a regex, but most of those ways are so convoluted that you'll wish you had spent the time learning the simple regex sequence that you need for it.
Upvotes: 5