Positive lookbehind with a matching group to be extracted

Question

testString = ("Tricks"
              "")
import re
re.sub("(?<=[(.+?)
\s+])", "{{ \1 @ \2 }}", testString)

This produces: invalid group reference.

Making the replacement take only \1, only extracts envelope, that makes me think that the lookbehind is ignored. Is there a way to extract something from lookbehind?

I'm looking forward to produce:

Tricks
{{ Tricks @ envelope }}

Martijn Pieters · Accepted Answer

Looks like you really want to use a HTML parser instead. Mixing Regular expressions and HTML get's real painful, really really fast.

In your regular expression, you created a character class (a set of characters that is allowed to match) consisting of <, h, 2, >, etc. here:

[(.+?)\s+]

which could have been written as:

[<>h2()+.?/\s]

and it would match the same characters.

Don't use [..] unless you want to create a set of characters for a match (\s, \d, etc. are pre-built character classes).

However, even if you were to remove the brackets, the lookbehind is not allowed. You are not allowed to use variable-width patterns in a lookbehind (no + or *). So, with the character class the lookbehind no longer matches what you think it matches, without it the lookbehind is not permissable.

All in all, just just BeautifulSoup instead.

Positive lookbehind with a matching group to be extracted

Answers (1)

Related Questions