Reputation: 13357
I have the following regex and input:
Basically, I want to match on the very last "yo" and keep everything in green (group(1)).
This works fine for small files/input.
However, if I run this from within java against a very large (100k) file, which there are no pattern matches (just a bunch of text - war&peace snippet) it can take 10+sec to return from trying to find a match. I am assuming backtracking issues with the regex (specifically the (.*) group(1) match).
What can I do to prevent backtracking per the use case and speed up this regex to satisfy the above requirements?
-- Java Code --
// Works fine for this small snippet but when run against 100k large input
// as described above some serious perf issues start happening.
String text = "Hi\n\nyo keep this here\n\nKeep this here\n\nyo\nkey match line here cut me:\n\nAll of this here should be deleted";
System.out.println(text);
Pattern PATTERN = Pattern.compile("^(.*)((\\byo\\b.*?(cut me:).*))$",
Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = PATTERN.matcher(text);
if (m.find()) {
text = m.group(1);
System.out.println(text);
}
Upvotes: 4
Views: 68
Reputation: 785128
Try this regex:
^([\s\S]*)\byo\b[\s\S]*?(cut me:)
Without m
and s
flags.
In my testing this is turning out to be faster than your regex. (You can also check it on regex101's debugger)
Upvotes: 2