Reputation: 618
I have encountered the following regex in the company's code base.
(?:.*)(\\bKEYWORD\\b(?:\\s+)+(?:.*)+;)(?:.*)
Breakdown of Regex:
Non Capturing group: (?:.*)
Capturing group (\\bKEYWORD\\b(?:\\s+)+(?:.*)+;)
2.1 Non Captuing group: (?:\\s+)+
2.2 Non Capturing group: (?:.*)+
Non Capturing group: (?:.*)
The above regex goes into catastrophic bactracking when it fails to match ;
or the test sample becomes too long. Check below the two test samples:
1. -- KEYWORD the procedure if data match between Type 1 and Type 2/3 views is not required.
2. KEYWORD SET MESSAGE_TEXT = '***FAILURE : '||V_G_ERR_MSG_01; /*STATEMENT TO THROW USER DEFINED ERROR CODE AND EXCEPTION TO THE CALLER*/
I went though Runaway Regular Expression's article aswell and tried to use to Atomic Grouping
but still no results. Can anybody help me how to fix this regex ?
Upvotes: 2
Views: 217
Reputation: 18611
According to the link you provided, there are patterns like (x+x+)+
in your expression: (?:\\s+)+
and the subsequent (?:.*)+
. The first matches one or more whitespace characters one or more times, and the second matches indefinite amount of any chars one or more times. This hardly makes sense.
Non-capturing groups are unnecessary here.
Use
.*\\b(KEYWORD\\s.*;).*
See proof
Explanation
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
KEYWORD 'KEYWORD'
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
Upvotes: 2