Reputation: 75
I have the below regex here which is written to support the PRCE/PRCE2 format. However, this throws the following error “Evaluation takes too long. Please check your regular expression.” Is there any way we can improve the performance of this regex by simplifying it?
Also, the regex throws "catastrophic backtracking" error as well.
(\border\D*\W*)\d+(*SKIP)(*F)|(\border\D*number\W*)\d+(*SKIP)(*F)|(?<!x)(?=(?:[._ –-]*\d){9})(?!9|66\D*6|00\D*0|(?:\d\D*){3}0\D*0|(?:\d\D*){5}0(?:\D*0){3})\d(?:[._ –-]*\d){4}
The above regex has set of rules in it. Please find the requirements of the regex.
Regex101 is giving me the exact output as expected but it has a performance issue. Need to simplify it.
Upvotes: 2
Views: 380
Reputation: 89564
Other approach: expand the pattern!
~(*UTF)
x (?<! \w x ) [\d._ –-]* (*SKIP) (*F)
|
order (?<! \w order ) [\W_]* (?:number)? [\W_]* [\d._ –-]* (*SKIP) (*F)
|
(?<res>
[1-578] (?: [._ –-]{0,3}+ \d ){2} (?: 0{2} [\d._ –-]* (*SKIP) (*F) )?
(?: [._ –-]{0,3}+ \d ){2}
)
(?: 0{4} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){4}
|
(?<res>
0 (?: 0{2} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){2}
(?: 0{2} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){2}
)
(?: 0{4} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){4}
|
(?<res>
6 (?: 6{2} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){2}
(?: 0{2} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){2}
)
(?: 0{4} [\d._ –-]* (*SKIP) (*F) )? (?: [._ –-]{0,3}+ \d ){4}
|
9 [\d._ –-]* (*SKIP) (*F)
~iJx
The pattern is indeed more long, but 2 times faster and with 8 times fewer steps.
Note that I uses a capture group to extract the first 5 digits and the 4 remaining digits are also consumed, but if you prefer, you can also remove this capture group and put the 4 remaining digits in a lookahead (more steps but more efficient).
I started the pattern with (*UTF)
since it contains a dash out of the ascii range.
Upvotes: 1
Reputation: 785406
You may try this refactored regex:
\b(?>x|order(?>[\W_]*number)?[\W_]*)\d+(*SKIP)(*F)|(?=(?>[._ –-]*\d){9})(?>(?>9|6{3}|0{3}|(?>\d\D*){3}00|(?>\d\D*){5}0{4})(*SKIP)(*F)|\d(?>\D*\d){4})
Compared to your existing demo link it is taking almost half number of steps in the demo link.
Upvotes: 2