Reputation: 43169
While trying to set up a verbose regex:
# set up variables
ankerwords = ['beerdigt','bestattet','begraben','beigesetzt']
# combine the words, five words before/after
rx = re.compile(r'''
(?:\b\w+\W+){5} # five words before
(?:{})
(?:\W+\w+\b){5} # five words thereafter
'''.format("|".join(ankerwords)), re.X)
This throws an error IndexError: tuple index out of range
.
{5}
in the expression but how to get around it without splitting the string in several parts, i.e.
'''(?:\b\w+\W+){5}''' + '(?:{})'.format(...)
It's more a question of style, really.
Upvotes: 3
Views: 95
Reputation: 3195
Jean covered pretty much every way of escaping curly braces quite well. The only thing I'd add is if your concern is stylistic, and you have the luxury of using Python 3.6+
, then you can make it slightly more readable with
rx = re.compile(fr'''
(?:\b\w+\W+){{5}} # five words before
(?:{"|".join(ankerwords)})
(?:\W+\w+\b){{5}} # five words thereafter
''', re.X)
Upvotes: 3
Reputation: 140188
doubling the braces work, it tells format
to consider the curly braces as a normal char (it escapes them: How can I print literal curly-brace characters in python string and also use .format on it?):
rx = re.compile(r'''
(?:\b\w+\W+){{5}} # five words before
(?:{})
(?:\W+\w+\b){{5}} # five words thereafter
'''.format("|".join(ankerwords)), re.X)
or using old style %
formatting:
rx = re.compile(r'''
(?:\b\w+\W+){5} # five words before
(?:%s)
(?:\W+\w+\b){5} # five words thereafter
''' % ("|".join(ankerwords)), re.X)
another way in that case, since the {5}
is repeated, maybe like this:
rx = re.compile(r'''
(?:\b\w+\W+){five} # five words before
(?:{expr})
(?:\W+\w+\b){five} # five words thereafter
'''.format(expr="|".join(ankerwords),five="{5}", re.X)
(which avoids to double the braces and allows to "parametrize" the number of words once and for all)
Upvotes: 4