Art
Art

Reputation: 1037

Python regex problem

What I am trying to do: Parse a query for a leading or trailing ? which will result in a search on the rest of the string.

"foobar?" or "?foobar" results in a search. "foobar" results in some other behavior.

This code works as expected in the interpreter:

 >>> import re
 >>> print re.match(".+\?\s*$","foobar?")
 <_sre.SRE_Match object at 0xb77c4d40>
 >>> print re.match(".+\?\s*$","foobar")
 None

This code from a Django app does not:

doSearch = { "text":"Search for: ", "url":"http://www.google.com/#&q=QUERY", "words":["^\?\s*",".+\?\s*$"] }
...
subQ = myCore.lookForPrefix(someQuery, doSearch["words"])
...
def lookForPrefix(query,listOfPrefixes):
    for l in listOfPrefixes:
        if re.match(l, query):
            return re.sub(l,'', query)
    return False

The Django code never matches the trailing "?", all other regexs work fine.

And ideas about why not?

Upvotes: 1

Views: 330

Answers (2)

Sean Reifschneider
Sean Reifschneider

Reputation: 1261

You probably want to use raw strings for regexes, such as: r'^\s\?'. Regular strings will prevent problems with escaped characters becoming other values (r'\0' is the same as '\0', but different from '\0' (a single null character)).

Also r'^\?\s*|\?\s*$' will NOT work as intended by Max S. because the | is alternating between "\s* and \?. The regex proposed in the EDIT interprets to: question mark at the beginning of the line followed by any number of spaces OR a question mark, followed by any number of spaces and the end of the line.

I believe Max S. intended: r'(^\?\s*)|(\?\s*$)', which interprets to: a question mark followed by any number of spaces at the beginning or end of the line.

Upvotes: 0

Max Shawabkeh
Max Shawabkeh

Reputation: 38683

The problem is in your second regex. It matches the whole query, so using re.sub() will replace it all with an empty string. I.e. lookForPrefix('foobar?',listOfPrefixes) will return ''. You are likely checking the return value in an if, so it evaluates the empty string as false.

To solve this, you just need to change the second regex to \?\s*$ and use re.search() instead of re.match(), as the latter requires that your regex matches from the beginning of the string.

doSearch = { "text":"Search for: ", "url":"http://www.google.com/#&q=QUERY", "words":["^\?\s*","\?\s*$"] }

def lookForPrefix(query,listOfPrefixes):
    for l in listOfPrefixes:
        if re.search(l, query):
            return re.sub(l,'', query)
    return False

The result:

>>> lookForPrefix('?foobar', doSearch["words"])
'foobar'
>>> lookForPrefix('foobar?', doSearch["words"])
'foobar'
>>> lookForPrefix('foobar', doSearch["words"])
False

EDIT: In fact, you might as well combine the two regexes into one: ^\?\s*|\?\s*$. That will work equally well.

Upvotes: 3

Related Questions