genxgeek
genxgeek

Reputation: 13357

Possible regex backtracking issues?

I have the following regex and input:

http://regex101.com/r/cI3fG4

Basically, I want to match on the very last "yo" and keep everything in green (group(1)).

This works fine for small files/input.

However, if I run this from within java against a very large (100k) file, which there are no pattern matches (just a bunch of text - war&peace snippet) it can take 10+sec to return from trying to find a match. I am assuming backtracking issues with the regex (specifically the (.*) group(1) match).

What can I do to prevent backtracking per the use case and speed up this regex to satisfy the above requirements?

-- Java Code --

    // Works fine for this small snippet but when run against 100k large input
    // as described above some serious perf issues start happening.  

    String text = "Hi\n\nyo keep this here\n\nKeep this here\n\nyo\nkey match line here cut me:\n\nAll of this here should be deleted";
    System.out.println(text);
    Pattern PATTERN = Pattern.compile("^(.*)((\\byo\\b.*?(cut me:).*))$",
            Pattern.MULTILINE | Pattern.DOTALL);
    Matcher m = PATTERN.matcher(text);
    if (m.find()) {
        text = m.group(1);
        System.out.println(text);
    }

Upvotes: 4

Views: 68

Answers (1)

anubhava
anubhava

Reputation: 785128

Try this regex:

^([\s\S]*)\byo\b[\s\S]*?(cut me:)

Without m and s flags.

Online Demo: http://regex101.com/r/lC9yZ5

In my testing this is turning out to be faster than your regex. (You can also check it on regex101's debugger)

Upvotes: 2

Related Questions