HMdeveloper
HMdeveloper

Reputation: 2884

finding repeated characters in a row (3 times or more) in a string

Here is the code for finding repeated character like A in AAbbbc

String stringToMatch = "abccdef";
    Pattern p = Pattern.compile("((\\w)\\2+)+");
    Matcher m = p.matcher(tweet);
    while (m.find())
    {
       System.out.println("Duplicate character " + m.group(0));
    }

Now the problem is that I want to find the characters that are repeated but 3 times or more in a row, when I change 2 to 3 in the above code it does not work, Can anyone help?

Upvotes: 0

Views: 3713

Answers (3)

Kasravnd
Kasravnd

Reputation: 107347

You shouldn't change 2 to 3 because it's the number of capture groups, not it's frequency.You can use two group references here :

"((\\w)\\2\\2)+"

But still your regex doesn't match strings like your example! Since it just match repeated characters.For that aim you can use following regex :

"((\\w)\\2+\\2)+.*"

Upvotes: 3

Luv2code
Luv2code

Reputation: 1109

That original regex is flawed. It only finds "word" characters (alpha, numeric, underscore). The requirement is "find characters that repeat 3 or more times in a row." The dot is the any-character metacharacter.

(?=(.)\1{3})(\1+)

So, that will find a character that occurs 4 or more consecutive times (i.e., meets your requirement of a character that "repeats" three or more times). If you really meant "occurs," change the 3 to 2. Anyway, it does a non-consuming "zero-length assertion" before capturing any data, so should be more efficient. It will only consume and capture data once you've found your minimum requirement (a single character that repeats at least 3 times). You can then consume it with the one-or-more '+' quantifier because you know it's a match you want; further quantification is redundant--your positive lookahead has already assured (asserted) that. Your results are in capture group 2 "(\1+)" and you can refer to it as \2.

Note: I tested that with perl command-line utility, so that's the raw regex. It looks like you may need to escape certain characters prior to using it in the programming language you're using.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174816

You may use the repetation quantifier.

Pattern p = Pattern.compile("(\\w)\\1{2,}");
Matcher m = p.matcher(tweet);
while (m.find())
{
   System.out.println("Duplicate character " + m.group(1));
}

Now the duplicate character is captured by index 1 not index 0 which refers the whole match. Just change the number inside the repeatation quantifier to match the char which repeats n or more times like "(\\w)\\1{5,}" ..

Upvotes: 3

Related Questions