Reputation: 2365
I have the following script to check if a string contains a list item:
word = ['one',
'two',
'three']
string = 'my favorite number is two'
if any(word_item in string.split() for word_item in word):
print 'string contains a word from the word list: %s' % (word_item)
This works, but I'm trying to print the list item(s) that the string contains. What am I doing wrong?
Upvotes: 4
Views: 6636
Reputation: 2130
You can use set
intersection:
word = ['one', 'two', 'three']
string = 'my favorite number is two'
co_occuring_words = set(word) & set(string.split())
for word_item in co_occuring_words:
print 'string contains a word from the word list: %s' % (word_item)
Upvotes: 1
Reputation: 54242
The problem is that you're using an if
statement instead of a for
statement, so your print
only runs (at most) once (if at least one word matches), and at that point, any
has run through the whole loop.
This is the easiest way to do what you want:
words = ['one',
'two',
'three']
string = 'my favorite number is two'
for word in words:
if word in string.split():
print('string contains a word from the word list: %s' % (word))
If you want this to be functional for some reason, you could do it like this:
for word in filter(string.split().__contains__, words):
print('string contains a word from the word list: %s' % (word))
Since someone is bound to answer with a performance-related answer even though this question has nothing to do with performance, it would be more efficient to split the string once, and depending on how many words you want to check, converting it to a set
might also be useful.
Regarding your question in the comments, if you want multi-word "words", there are two easy options: adding whitespace and then searching for the words in the full string, or regular expressions with word boundaries.
The simplest way is to add a space character before and after the text to search and then search for ' ' + word + ' '
:
phrases = ['one',
'two',
'two words']
text = "this has two words in it"
for phrase in phrases:
if " %s " % phrase in text:
print("text '%s' contains phrase '%s'" % (text, phrase))
For regular expressions, just use the \b
word boundary:
import re
for phrase in phrases:
if re.search(r"\b%s\b" % re.escape(phrase), text):
print("text '%s' contains phrase '%s'" % (text, phrase))
Which one is "nicer" is hard to say, but the regular expression is probably significantly less efficient (if that matters to you).
And if you don't care about word boundaries, you can just do:
phrases = ['one',
'two',
'two words']
text = "the word 'tone' will be matched, but so will 'two words'"
for phrase in phrases:
if phrase in text:
print("text '%s' contains phrase '%s'" % (text, phrase))
Upvotes: 6
Reputation: 180391
If you has a word like 'ninety five'
you could split that word and check all words intersect with a set of the words in the string:
words = ['one',
'two',
'three', "fifty ninety"]
string = set('my favorite number is two fifty five'.split())
for word in words:
spl = word.split()
if len(spl) > 1:
if all(string.intersection([w]) for w in spl):
print(word)
elif string.intersection([word]):
print(word)
It will also return True for ninety five
so that is something you need to decide is workable or not but using intersection
for single words will work well. make sure you wrap the string in a list or a tuple or "foo"
will become {"f","o"}
You can also use set.issuperset
instead of all
:
for word in words:
spl = word.split()
if len(spl) > 1:
if string.issuperset(spl):
print(word)
elif string.intersection([word]):
print(word)
Upvotes: 3