Markum
Markum

Reputation: 4049

regex behaving unexpected

Script:

import re

matches = ['hello', 'hey', 'hi', 'hiya']

def check_match(string):
    for item in matches:
        if re.search(item, string):
            print 'Match found: ' + string
        else:
            print 'Match not found: ' + string

check_match('hey')
check_match('hello there')
check_match('this should not match')
check_match('oh, hiya')

Output:

Match not found: hey
Match found: hey
Match not found: hey
Match not found: hey
Match found: hello there
Match not found: hello there
Match not found: hello there
Match not found: hello there
Match not found: this should not match
Match not found: this should not match
Match found: this should not match
Match not found: this should not match
Match not found: oh, hiya
Match not found: oh, hiya
Match found: oh, hiya
Match found: oh, hiya

There are various things I don't understand, for starters, each string is searched four times in this output, some returning two as a found match, some three. I'm unsure of what is wrong in my code that is causing this happen, but could someone try and see what's wrong?

The expected output would be this:

Match found: hey
Match found: hello there
Match not found: this should not match
Match found: oh, hiya

Upvotes: 1

Views: 124

Answers (4)

Bart Kiers
Bart Kiers

Reputation: 170148

It's not behaving incorrectly, it's your misconception about re.search(...).

See the comments after your output:

Match not found: hey                    # because 'hello' is not in 'hey'
Match found: hey                        # because 'hey' is in 'hey'
Match not found: hey                    # because 'hi' is not in 'hey'
Match not found: hey                    # because 'hiya' is not in 'hey'

Match found: hello there                # because 'hello' is in 'hello there'
Match not found: hello there            # because 'hey' is not in 'hello there'
Match not found: hello there            # because 'hi' is not in 'hello there'
Match not found: hello there            # because 'hiya' is not in 'hello there'

Match not found: this should not match  # because 'hello' is not in 'this should not match'
Match not found: this should not match  # because 'hey' is not in 'this should not match'
Match found: this should not match      # because 'hi' is in 'this should not match'
Match not found: this should not match  # because 'hiya' is not in 'this should not match'

Match not found: oh, hiya               # because 'hello' is not in 'oh, hiya'
Match not found: oh, hiya               # because 'hey' is not in 'oh, hiya'
Match found: oh, hiya                   # because 'hi' is in 'oh, hiya'
Match found: oh, hiya                   # because 'hiya' is in 'oh, hiya'

If you don't want a match for the pattern hi in case of input oh, hiya, you should wrap word boundaries around your pattern:

\bhi\b

which will cause it to only match occurrences of hi not surrounded by other letters (well hiya there would not match the pattern \bhi\b, but well hi there would).

Upvotes: 4

Maria Zverina
Maria Zverina

Reputation: 11163

Try this - it's more concise and it will flag up multiple matches:

import re

matches = ['hello', 'hey', 'hi', 'hiya']

def check_match(string):
    results = [item for item in matches if re.search(r'\b%s\b' % (item), string)]
    print 'Found %s' % (results) if len(results) > 0 else "No match found"

check_match('hey')
check_match('hello there')
check_match('this should not match')
check_match('oh, hiya')
check_match('xxxxx xxx')
check_match('hello and hey')

Gives:

Found ['hey']
Found ['hello']
No match found
Found ['hiya']
No match found
Found ['hello', 'hey']

Upvotes: 2

Varun
Varun

Reputation: 1

The for loop is checking the string against each of the 'matches', and printing out found or not found for each one. What you really want is to see if ANY of the matches match, and then print out one single "found" or "not found". I don't actually know python, so the syntax may be off.

for item in matches:
    if re.search(item, string):
    found = true
if found:
    print 'Match found: ' + string
else:
    print 'Match not found: ' + string

`

Upvotes: 0

CrayonViolent
CrayonViolent

Reputation: 32532

you get 4 searches and 4 outputs for each one because you are looping through an array, searching and outputting something for each element in the array...

Upvotes: 0

Related Questions