Abhishek Dasgupta
Abhishek Dasgupta

Reputation: 618

Fix for the catastrophic backtracking in the regex expression

I have encountered the following regex in the company's code base.

(?:.*)(\\bKEYWORD\\b(?:\\s+)+(?:.*)+;)(?:.*)

Breakdown of Regex:

  1. Non Capturing group: (?:.*)

  2. Capturing group (\\bKEYWORD\\b(?:\\s+)+(?:.*)+;)

    2.1 Non Captuing group: (?:\\s+)+

    2.2 Non Capturing group: (?:.*)+

  3. Non Capturing group: (?:.*)

The above regex goes into catastrophic bactracking when it fails to match ; or the test sample becomes too long. Check below the two test samples:

1. --      KEYWORD the procedure if data match between Type 1 and Type 2/3 views is not required.
2. KEYWORD SET MESSAGE_TEXT = '***FAILURE : '||V_G_ERR_MSG_01; /*STATEMENT TO THROW USER DEFINED ERROR CODE AND EXCEPTION TO THE CALLER*/

I went though Runaway Regular Expression's article aswell and tried to use to Atomic Grouping but still no results. Can anybody help me how to fix this regex ?

Upvotes: 2

Views: 217

Answers (1)

Ryszard Czech
Ryszard Czech

Reputation: 18611

According to the link you provided, there are patterns like (x+x+)+ in your expression: (?:\\s+)+ and the subsequent (?:.*)+. The first matches one or more whitespace characters one or more times, and the second matches indefinite amount of any chars one or more times. This hardly makes sense.

Non-capturing groups are unnecessary here.

Use

.*\\b(KEYWORD\\s.*;).*

See proof

Explanation

--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    KEYWORD                  'KEYWORD'
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    ;                        ';'
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))

Upvotes: 2

Related Questions