Hanpan
Hanpan

Reputation: 10251

Python regex multiple search

I need to search a string for multiple words.

import re

words = [{'word':'test1', 'case':False}, {'word':'test2', 'case':False}]

status = "test1 test2"

for w in words:
    if w['case']:
        r = re.compile("\s#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile("\s#?%s" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']

For some reason, this will only ever find "test2" and never "test1". Why is this?

I know I can use | delimitated searches but there could be hundreds of words which is why I am using a for loop.

Upvotes: 7

Views: 6646

Answers (2)

Norbert P.
Norbert P.

Reputation: 2807

As Martijn pointed out, there's no space before test1. But also your code doesn't properly handle the case when a word is longer. Your code would find test2blabla as an instance of test2, and I'm not sure if that is what you want.

I suggest using word boundary regex \b:

for w in words:
    if w['case']:
        r = re.compile(r"\b%s\b" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile(r"\b%s\b" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']

EDIT:

I should've pointed out that if you really want to allow only (whitespace)word or (whitespace)#word format, you cannot use \b.

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1121366

There is no space before test1 in status, while your generated regular expressions require there to be a space.

You can modify the test to match either after a space or at the beginning of a line:

for w in words:
    if w['case']:
        r = re.compile("(^|\s)#?%s" % w['word'], re.IGNORECASE|re.MULTILINE)
    else:
        r = re.compile("(^|\s)#?%s" % w['word'], re.MULTILINE)
    if r.search(status):
        print "Found word %s" % w['word']

Upvotes: 9

Related Questions