fredrik
fredrik

Reputation: 17617

Python RegEx, match words in string and get count

I want to match a list of words with an string and get how many of the words are matched.

Now I have this:

import re
words = ["red", "blue"]
exactMatch = re.compile(r'\b%s\b' % '\\b|\\b'.join(words), flags=re.IGNORECASE)
print exactMatch.search("my blue cat")
print exactMatch.search("my red car")
print exactMatch.search("my red and blue monkey")
print exactMatch.search("my yellow dog")

My current regex will match the first 3, but I would like to find out how many of the words in the list words that matches the string passed to search. Is this possible without making a new re.compile for each word in the list?

Or is there another way to achieve the same thing?

The reason I want to keep the number of re.compile to a minimum is speed, since in my application I have multiple word lists and about 3500 strings to search against.

Upvotes: 2

Views: 15085

Answers (4)

Diego Navarro
Diego Navarro

Reputation: 9704

If I got right the question, you only want to know the number of matches of blue or red in a sentence.

>>> exactMatch = re.compile(r'%s' % '|'.join(words), flags=re.IGNORECASE)
>>> print exactMatch.findall("my blue blue cat")
['blue', 'blue']
>>> print len(exactMatch.findall("my blue blue cat"))
2

You need more code if you want to test multiple colors

Upvotes: 3

stema
stema

Reputation: 92986

If you use findall instead of search, then you get a tuple as result containing all the matched words.

print exactMatch.findall("my blue cat")
print exactMatch.findall("my red car")
print exactMatch.findall("my red and blue monkey")
print exactMatch.findall("my yellow dog")

will result in

['blue']
['red']
['red', 'blue']
[]

If you need to get the amount of matches you get them using len()

print len(exactMatch.findall("my blue cat"))
print len(exactMatch.findall("my red car"))
print len(exactMatch.findall("my red and blue monkey"))
print len(exactMatch.findall("my yellow dog"))

will result in

1
1
2
0

Upvotes: 11

bash-o-logist
bash-o-logist

Reputation: 6911

for w in words:
    if w in searchterm:
        print "found"

Upvotes: 1

VGE
VGE

Reputation: 4191

Why not storing all words in a hash and iterate a lookup of every words in sentences thru a finditer

  words = { "red": 1 .... }
  word = re.compile(r'\b(\w+)\b')
  for i in word.finditer(sentence): 
     if words.get(i.group(1)):
       ....

Upvotes: 1

Related Questions