Kamesh
Kamesh

Reputation: 75

Improving the performance of an PCRE Regex Pattern

I have the below regex here which is written to support the PRCE/PRCE2 format. However, this throws the following error “Evaluation takes too long. Please check your regular expression.” Is there any way we can improve the performance of this regex by simplifying it?

Also, the regex throws "catastrophic backtracking" error as well.

(\border\D*\W*)\d+(*SKIP)(*F)|(\border\D*number\W*)\d+(*SKIP)(*F)|(?<!x)(?=(?:[._ –-]*\d){9})(?!9|66\D*6|00\D*0|(?:\d\D*){3}0\D*0|(?:\d\D*){5}0(?:\D*0){3})\d(?:[._ –-]*\d){4}

Regex Demo here

The above regex has set of rules in it. Please find the requirements of the regex.

  1. First 5 numbers should only be masked in a 9 digit number.
  2. Should not mask any numbers if the 'x' or 'X' precedes the 9 digit number.
  3. If the "order" or "order number" string precedes the 9 digit numbers, then it should not be matched.
  4. You can find the list of use cases for the same along with the rules in this link. Usecases with requirements

Regex101 is giving me the exact output as expected but it has a performance issue. Need to simplify it.

Upvotes: 2

Views: 380

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89564

Other approach: expand the pattern!

~(*UTF)
    x (?<! \w x ) [\d._ –-]* (*SKIP) (*F)
  |
    order (?<! \w order ) [\W_]* (?:number)? [\W_]* [\d._ –-]* (*SKIP) (*F)
  |
    (?<res>
      [1-578]  (?: [._ –-]{0,3}+ \d ){2}  (?: 0{2}  [\d._ –-]* (*SKIP) (*F) )?
      (?: [._ –-]{0,3}+ \d ){2}
    )
    (?: 0{4} [\d._ –-]* (*SKIP) (*F) )?  (?: [._ –-]{0,3}+ \d ){4}
  |
    (?<res>
      0  (?: 0{2} [\d._ –-]* (*SKIP) (*F) )?  (?: [._ –-]{0,3}+ \d ){2}
      (?: 0{2} [\d._ –-]* (*SKIP) (*F) )?  (?: [._ –-]{0,3}+ \d ){2}
    )
    (?: 0{4} [\d._ –-]* (*SKIP) (*F) )?  (?: [._ –-]{0,3}+ \d ){4}
  |
    (?<res>
      6  (?: 6{2} [\d._ –-]* (*SKIP) (*F) )?  (?: [._ –-]{0,3}+ \d ){2}
      (?: 0{2} [\d._ –-]* (*SKIP) (*F) )?  (?: [._ –-]{0,3}+ \d ){2}
    )
    (?: 0{4} [\d._ –-]* (*SKIP) (*F) )?  (?: [._ –-]{0,3}+ \d ){4}
  |
    9 [\d._ –-]* (*SKIP) (*F)
~iJx

The pattern is indeed more long, but 2 times faster and with 8 times fewer steps.

Note that I uses a capture group to extract the first 5 digits and the 4 remaining digits are also consumed, but if you prefer, you can also remove this capture group and put the 4 remaining digits in a lookahead (more steps but more efficient).

I started the pattern with (*UTF) since it contains a dash out of the ascii range.

Upvotes: 1

anubhava
anubhava

Reputation: 785406

You may try this refactored regex:

\b(?>x|order(?>[\W_]*number)?[\W_]*)\d+(*SKIP)(*F)|(?=(?>[._ –-]*\d){9})(?>(?>9|6{3}|0{3}|(?>\d\D*){3}00|(?>\d\D*){5}0{4})(*SKIP)(*F)|\d(?>\D*\d){4})

RegEx Demo

Compared to your existing demo link it is taking almost half number of steps in the demo link.

Upvotes: 2

Related Questions