smci
smci

Reputation: 33960

How to write patterns for use with re.VERBOSE when they contain meaningful whitespace?

Regexes containing meaningful spaces break when re.VERBOSE is added, apparently because re.VERBOSE 'helpfully' magics away the (meaningful) whitespace inside 'Issue Summary', as well as all the crappy non-meaningful whitespace (e.g. padding and newlines inside a (multiline) pattern). (My use of re.VERBOSE with multiline is non-negotiable - this is actually a massive simplification of a huge multiline regex where re.VERBOSE is necessary just to stay sane.)

import re
re.match(r'''Issue Summary.*''', 'Issue Summary: fails''', re.U|re.VERBOSE)
# No match!
re.match(r'''Issue Summary.*''', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue Summary.*', 'Issue Summary: passes''', re.U)
<_sre.SRE_Match object at 0x10b98ff38>

Is there a saner alternative to write re.VERBOSE-friendly patterns containing meaningful spaces, short of replacing each instance in my pattern with '\s' or '.', which is not just ugly but counter-intuitive and a pain to automate?

re.match(r'Issue\sSummary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10ba36030>
re.match(r'Issue.Summary.*''', 'Issue Summary: fails', re.VERBOSE)
<_sre.SRE_Match object at 0x10b98ff38>

(As an aside, this a useful docbug catch on Python 2 and 3. I'll file it once I get consensus here on what the right solution is)

Upvotes: 9

Views: 1836

Answers (1)

Tom Karzes
Tom Karzes

Reputation: 24062

If re.VERBOSE is used, then I think there's no choice other than to change the regular expression string. However, I would suggest one of the following:

r'abc\ def'

or:

r'abc[ ]def'

Both r'\ ' and '[ ]' match a single space character (not any whitespace, only an actual space). Note that, without the r in front, the backslash character would need to be doubled, i.e. \\.

Upvotes: 11

Related Questions