Reputation: 17617
I want to match a list of words with an string and get how many of the words are matched.
Now I have this:
import re
words = ["red", "blue"]
exactMatch = re.compile(r'\b%s\b' % '\\b|\\b'.join(words), flags=re.IGNORECASE)
print exactMatch.search("my blue cat")
print exactMatch.search("my red car")
print exactMatch.search("my red and blue monkey")
print exactMatch.search("my yellow dog")
My current regex will match the first 3, but I would like to find out how many of the words in the list words
that matches the string passed to search
. Is this possible without making a new re.compile
for each word in the list?
Or is there another way to achieve the same thing?
The reason I want to keep the number of re.compile
to a minimum is speed, since in my application I have multiple word lists and about 3500 strings to search against.
Upvotes: 2
Views: 15085
Reputation: 9704
If I got right the question, you only want to know the number of matches of blue or red in a sentence.
>>> exactMatch = re.compile(r'%s' % '|'.join(words), flags=re.IGNORECASE)
>>> print exactMatch.findall("my blue blue cat")
['blue', 'blue']
>>> print len(exactMatch.findall("my blue blue cat"))
2
You need more code if you want to test multiple colors
Upvotes: 3
Reputation: 92986
If you use findall
instead of search
, then you get a tuple as result containing all the matched words.
print exactMatch.findall("my blue cat")
print exactMatch.findall("my red car")
print exactMatch.findall("my red and blue monkey")
print exactMatch.findall("my yellow dog")
will result in
['blue']
['red']
['red', 'blue']
[]
If you need to get the amount of matches you get them using len()
print len(exactMatch.findall("my blue cat"))
print len(exactMatch.findall("my red car"))
print len(exactMatch.findall("my red and blue monkey"))
print len(exactMatch.findall("my yellow dog"))
will result in
1
1
2
0
Upvotes: 11
Reputation: 4191
Why not storing all words in a hash and iterate a lookup of every words in sentences thru a finditer
words = { "red": 1 .... }
word = re.compile(r'\b(\w+)\b')
for i in word.finditer(sentence):
if words.get(i.group(1)):
....
Upvotes: 1