Reputation: 33960
Regexes containing meaningful spaces break when re.VERBOSE is added, apparently because re.VERBOSE 'helpfully' magics away the (meaningful) whitespace inside 'Issue Summary', as well as all the crappy non-meaningful whitespace (e.g. padding and newlines inside a (multiline) pattern). (My use of re.VERBOSE with multiline is non-negotiable - this is actually a massive simplification of a huge multiline regex where re.VERBOSE is necessary just to stay sane.)
import re
re.match(r'''Issue Summary.*''', 'Issue Summary: fails''', re.U|re.VERBOSE)
# No match!
re.match(r'''Issue Summary.*''', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue Summary.*', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10b98ff38>
Is there a saner alternative to write re.VERBOSE-friendly patterns containing meaningful spaces, short of replacing each instance in my pattern with '\s' or '.', which is not just ugly but counter-intuitive and a pain to automate?
re.match(r'Issue\sSummary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue.Summary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10b98ff38>
(As an aside, this a useful docbug catch on Python 2 and 3. I'll file it once I get consensus here on what the right solution is)
Upvotes: 9
Views: 1836
Reputation: 24062
If re.VERBOSE
is used, then I think there's no choice other than to change the regular expression string. However, I would suggest one of the following:
r'abc\ def'
or:
r'abc[ ]def'
Both r'\ '
and '[ ]'
match a single space character (not any whitespace, only an actual space). Note that, without the r
in front, the backslash character would need to be doubled, i.e. \\
.
Upvotes: 11