Jan
Jan

Reputation: 43169

Setting up verbose regex

While trying to set up a verbose regex:

# set up variables
ankerwords = ['beerdigt','bestattet','begraben','beigesetzt']

# combine the words, five words before/after
rx = re.compile(r'''
    (?:\b\w+\W+){5} # five words before
    (?:{})
    (?:\W+\w+\b){5} # five words thereafter
    '''.format("|".join(ankerwords)), re.X)

This throws an error IndexError: tuple index out of range.


I know it's because of the {5} in the expression but how to get around it without splitting the string in several parts, i.e.

'''(?:\b\w+\W+){5}''' + '(?:{})'.format(...)

It's more a question of style, really.

Upvotes: 3

Views: 95

Answers (2)

bphi
bphi

Reputation: 3195

Jean covered pretty much every way of escaping curly braces quite well. The only thing I'd add is if your concern is stylistic, and you have the luxury of using Python 3.6+, then you can make it slightly more readable with

rx = re.compile(fr'''
    (?:\b\w+\W+){{5}} # five words before
    (?:{"|".join(ankerwords)})
    (?:\W+\w+\b){{5}} # five words thereafter
    ''', re.X)

Upvotes: 3

Jean-François Fabre
Jean-François Fabre

Reputation: 140188

doubling the braces work, it tells format to consider the curly braces as a normal char (it escapes them: How can I print literal curly-brace characters in python string and also use .format on it?):

rx = re.compile(r'''
    (?:\b\w+\W+){{5}} # five words before
    (?:{})
    (?:\W+\w+\b){{5}} # five words thereafter
    '''.format("|".join(ankerwords)), re.X)

or using old style % formatting:

rx = re.compile(r'''
    (?:\b\w+\W+){5} # five words before
    (?:%s)
    (?:\W+\w+\b){5} # five words thereafter
    ''' % ("|".join(ankerwords)), re.X)

another way in that case, since the {5} is repeated, maybe like this:

rx = re.compile(r'''
    (?:\b\w+\W+){five} # five words before
    (?:{expr})
    (?:\W+\w+\b){five} # five words thereafter
    '''.format(expr="|".join(ankerwords),five="{5}", re.X)

(which avoids to double the braces and allows to "parametrize" the number of words once and for all)

Upvotes: 4

Related Questions