Reputation: 844
Using Java, i want to detect if a line starts with words and separator then "myword", but this regex takes too long. What is incorrect ?
^\s*(\w+(\s|/|&|-)*)*myword
Upvotes: 0
Views: 515
Reputation: 626689
The pattern ^\s*(\w+(\s|/|&|-)*)*myword
is not efficient due to the nested quantifier. \w+
requires at least one word character and (\s|/|&|-)*
can match zero or more of some characters. When the *
is applied to the group and the input string has no separators in between word characters, the expression becomes similar to a (\w+)*
pattern that is a classical catastrophical backtracking issue pattern.
Just a small illustration of \w+
and (\w+)*
performance:
\w+
: (\w+)*
You pattern is even more complicated and invloves more those backtracking steps. To avoid such issues, a pattern should not have optional subpatterns inside quantified groups. That is, create a group with obligatory subpatterns and apply the necessary quantifier to the group.
In this case, you can unroll the group you have as
String rx = "^\\s*(\\w+(?:[\\s/&-]+\\w+)*)[\\s/&-]+myword";
See IDEONE demo
Here, (\w+(\s|/|&|-)*)*
is unrolled as (\w+(?:[\s/&-]+\w+)*)
(I kept the outer parentheses to produce a capture group #1, you may remove these brackets if you are not interested in them). \w+
matches one or more word characters (so, it is an obligatory subpatter), and the (?:[\s/&-]+\w+)*
subpattern matches zero or more (*
, thus, this whole group is optional) sequences of one or more characters from the defined character class [\s/&-]+
(so, it is obligatory) followed with one or more word characters \w+
.
Upvotes: 4