Zout
Zout

Reputation: 924

Python look-behind regex issue: Invalid regular expression: look-behind requires fixed-width pattern

I need to match a linebreak in-between double quotes, as in:

<p class="calibre1">“This is the first sentence.</p>
<p class="calibre1">And this is the second!”</p>

This would match </p> <p class="calibre1">

Now, I got this working with the regex (?<=“[^”]*)</p>\s*<p[^>]*>(?!“) but I get the error described in the title: "Invalid regular expression: look-behind requires fixed-width pattern" when I try to use it non-manually. I need this regex for the eBook management/editing program, Calibre, which uses Python for its regex engine. The regex above works for manually searching a book, but when I try to include the regex as a "common option" (run on each eBook conversion) I get that error.

I don't see how it's possible to do this without a variable width look-behind, since you can't know how long it will be from the left doublequote to the linebreak. Help would be much appreciated!

Upvotes: 3

Views: 2903

Answers (2)

Robin
Robin

Reputation: 9644

Python re module, as most languages (with the notable exception of .NET), doesn't support variable length lookbehind.

Can't you use a capturing group instead ?

“[^”]*(</p>\s*<p[^>]*>)

Data in the first capturing group.

Upvotes: 3

aelor
aelor

Reputation: 11116

Lookbehinds need to be zero-width, thus quantifiers are not allowed.

Upvotes: 0

Related Questions