Setting up verbose regex

Question

While trying to set up a verbose regex:

# set up variables
ankerwords = ['beerdigt','bestattet','begraben','beigesetzt']

# combine the words, five words before/after
rx = re.compile(r'''
    (?:\b\w+\W+){5} # five words before
    (?:{})
    (?:\W+\w+\b){5} # five words thereafter
    '''.format("|".join(ankerwords)), re.X)

This throws an error IndexError: tuple index out of range.

I know it's because of the {5} in the expression but how to get around it without splitting the string in several parts, i.e.

'''(?:\b\w+\W+){5}''' + '(?:{})'.format(...)

It's more a question of style, really.

Jean-Fran&#231;ois Fabre · Accepted Answer

doubling the braces work, it tells format to consider the curly braces as a normal char (it escapes them: How can I print literal curly-brace characters in python string and also use .format on it?):

rx = re.compile(r'''
    (?:\b\w+\W+){{5}} # five words before
    (?:{})
    (?:\W+\w+\b){{5}} # five words thereafter
    '''.format("|".join(ankerwords)), re.X)

or using old style % formatting:

rx = re.compile(r'''
    (?:\b\w+\W+){5} # five words before
    (?:%s)
    (?:\W+\w+\b){5} # five words thereafter
    ''' % ("|".join(ankerwords)), re.X)

another way in that case, since the {5} is repeated, maybe like this:

rx = re.compile(r'''
    (?:\b\w+\W+){five} # five words before
    (?:{expr})
    (?:\W+\w+\b){five} # five words thereafter
    '''.format(expr="|".join(ankerwords),five="{5}", re.X)

(which avoids to double the braces and allows to "parametrize" the number of words once and for all)

Setting up verbose regex

Answers (2)

Related Questions