Master DJon
Master DJon

Reputation: 1965

PHP PCRE - Regex upgraded now failing (Catastrophic backtracking) + optimization

I first posted this question : Regex matching nested beginning and ending tags

It was answered perfectly by Wiktor Stribiżew. Now, I wanted to upgrade my Regex expression so that my parameters supports a JSON object (or almost, because lonely '{' and '[' aren't supported).

I have two expressions: one for paired tags, one for lonely tags. I first use the paired one, when all replacements done, I execute the lonely one. The modified lonely one works fine on regex101.com (https://www.regex101.com/r/HIEQZk/9), but the paired one tells me "castatrophic backtracking" (https://www.regex101.com/r/HIEQZk/8) even though in PHP in doesn't crash.

So could anyone help me optimize/fix this fairly huge regex. Even though there seems to be useless escaping, it is because begin/end markers and the splitter can be customized and thus have to be escaped. (The paired one is not as escaped because it is not the one generated by PHP, but the one made by Wiktor Stribiżew with the modifications I did.)

The only part that I think that shall be optimized/fixed is the "parameters" group which I just modified to support JSON objects. (Tests of these can be seen in the earlier versions of the same regex101 url. The ones here are with a real HTML to parse.)

Lonely expression

~
 \{\{ #Instruction start

   ([^\^\{\}]+) # (Group 1) Instruction name OR variable to reach if nothing else after then
   (?:
     \^
     (?:([^\\^\{\}]*)\^)? #(Group 2) Specific delimiter
     ([^\{\}]*{(?:[^{}\[\]]+|(?3))+}[^\{\}]*|[^\{\}]*\[(?:[^{}\[\]]+|(?3))+\][^\{\}]*|[^\{\}]+) # (Group 3) Parameters
   )?

 \}\} #Instruction end
~xg

Paired expression

~{{             # Opening tag start
  (\w+)         # (Group 1) Tag name
  (?:           # Not captured group for optional parameters
   (?:          # Not captured group for optional delimiter
    \^          # Aux delimiter
    ([^^\{\}]?) # (Group 2) Specific delimiter
   )?
   \^           # Aux delimiter
   ([^\{\}]*{(?:[^{}\[\]]+|(?3))+}[^\{\}]*|[^\{\}]*\[(?:[^{}\[\]]+|(?3))+\][^\{\}]*|[^\{\}]+)   # (Group 3) Parameters
  )?
 }}             # Opening tag end
  (             # (Group 4)
   (?>          
     (?R)       # Repeat the whole pattern
     |          # or match all that is not the opening/closing tag
     [^{]*(?:\{(?!{/?\1[^\{\}]*}})[^{]*)*
   )*           # Zero or more times
  )
 {{/\1}}        # Closing tag
~ix

Upvotes: 0

Views: 67

Answers (1)

bobble bubble
bobble bubble

Reputation: 18545

Try to replace your (?: non-capturing groups with (?> atomic groups to prevent/reduce backtracking wherever possible. Those are non capturing as well. And/or experiment with possessive quantifiers while watching the stepscounter/debugger in regex101.

Wherever you don't want the engine to go back and try different other ways.

This is your updated demo where I just changed the first (?: to (?>

Upvotes: 1

Related Questions