Melissa
Melissa

Reputation: 55

What is the correct syntax for combining a regular expression stored in a variable with a quantifier?

I know there's a lot of questions in stack overflow already about using a variable in regular expression, and I managed to make it work if the variable is one word, or if it only needs to match once; however, once I add both a special character/whitespace and a quantifier, I can't get it to match. For example, I want to match whatever is in some_var to any string that contains 3 consecutive copies of it.:

import re

some_var = "what what"

should_match = "what what what what what what hey"
not_a_match = "what what what what hey what what"

match = re.search(re.escape(some_var){3}, should_match)
no_match = re.search(re.escape(some_var){3}, not_a_match)

however the last two lines give me a syntax error, and I've tried

'(.*)'+re.escape(some_var){3}+'(.*)'
('(.*)'+re.escape(some_var)+'(.*)'){3}
'(.*)'+'re.escape(some_var){3}'+'(.*)'
're.escape(some_var){3}'

... I just can't seem to get the syntax for it to match correctly (I keep getting the false conditional). I've tried searching for the answer, but I'm not sure how to get it to recognize the quantifier properly.

Upvotes: 2

Views: 502

Answers (2)

dawg
dawg

Reputation: 104072

Regex patterns are just strings (with any non-alphanumerics backslash escaped to match a literal string), so you can either use format or % operator or concatenation to create the pattern string you need.

Given some value of n as a quantifier, in this case 3, you need to construct the regex string appropriately. The {3} part needs to be in the pattern string immediately following the re.escape(some_var).

You can use the % operator:

>>> n=3
>>> r'(?:\s*%s){%i}' % (re.escape(some_var), n)
'(?:\\s*what\\ what){3}'

Or, use format:

>>> r'(?:\s*{0}){{{1}}}'.format(re.escape(some_var), n)
'(?:\\s*what\\ what){3}'

Or use concatenation:

>>> r'(?:\s*'+re.escape(some_var)+'){'+str(n)+'}'
'(?:\\s*what\\ what){3}'

Any of these strings will now work as you think:

>>> re.match(r'(?:\s*%s){%i}' % (re.escape(some_var), n), should_match)
<_sre.SRE_Match object at 0x104244b28>
>>> re.match(r'(?:\s*%s){%i}' % (re.escape(some_var), n), not_a_match)
>>> 

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

You need to group that several words and add optional whitespace:

match = re.search(r"(?:\s*{0}){{3}}".format(re.escape(some_var)), should_match)

See IDEONE demo

The regex will look like (?:\s*what\ what){3}, and this is how it works: it matches 3 sequences of

  • \s* - 0 or more whitespace followed by
  • what\ what - literal what what substring.

Upvotes: 2

Related Questions