HMdeveloper
HMdeveloper

Reputation: 2884

regex pattern in java crashes with just one specific text

I have the following pattern to be checked against any text:

public static boolean endWithLinkOrHashAt(String commentstr)
{
    String urlPattern = "^[@|#]((?:\\w+\\s?){1,}).*:\\s[^?]?((?:\\w+\\s?){1,})[^?]((?:http|https):\\/\\/\\S+)(\\s[@|#]\\w+){0,}[^?]$";
    Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
    Matcher m = p.matcher(commentstr);
    if (m.find()) {
        System.out.println("yes");
        return true;
    }
    return false;
}

Now when I try it with the following text the program does not do anything, the console run forever without any result or any error:

endWithLinkOrHashAt("#BREAKING: @Baird resigning in aftermath of controversial win over @pmharper in game of #Trouble (with the pop-o-matic bubble) #cdnpoli");

Is anything wrong with my regex(but it works with other texts and it seems that it has problem with just this specific text)

Update:

Here is what I want my pattern to check against:

@ or # + 1 or 2 words + : + 1 words or more + link + nothing or any words that has # or @ at the beginning

Upvotes: 4

Views: 382

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

It seems the problem with your regex is that it caused catastrophic backtracking. The root cause it nested quantifiers.

I suggest using a more linear regex:

(?i)^[@#](\\S+(?:\\s+\\S+)?)\\s*:\\s*(\\S+(?:\\s+\\S+)*)\\s*(https?://\\S*)((?:\\s+(?=[#@])\\S+)*)\\s*$

See demo

It is basically the same regex I suggested before, I just added more whitespace to it.

Upvotes: 2

Related Questions