Reputation: 2884
Here is the code for finding repeated character like A in AAbbbc
String stringToMatch = "abccdef";
Pattern p = Pattern.compile("((\\w)\\2+)+");
Matcher m = p.matcher(tweet);
while (m.find())
{
System.out.println("Duplicate character " + m.group(0));
}
Now the problem is that I want to find the characters that are repeated but 3 times or more in a row, when I change 2 to 3 in the above code it does not work, Can anyone help?
Upvotes: 0
Views: 3713
Reputation: 107347
You shouldn't change 2 to 3 because it's the number of capture groups, not it's frequency.You can use two group references here :
"((\\w)\\2\\2)+"
But still your regex doesn't match strings like your example! Since it just match repeated characters.For that aim you can use following regex :
"((\\w)\\2+\\2)+.*"
Upvotes: 3
Reputation: 1109
That original regex is flawed. It only finds "word" characters (alpha, numeric, underscore). The requirement is "find characters that repeat 3 or more times in a row." The dot is the any-character metacharacter.
(?=(.)\1{3})(\1+)
So, that will find a character that occurs 4 or more consecutive times (i.e., meets your requirement of a character that "repeats" three or more times). If you really meant "occurs," change the 3 to 2. Anyway, it does a non-consuming "zero-length assertion" before capturing any data, so should be more efficient. It will only consume and capture data once you've found your minimum requirement (a single character that repeats at least 3 times). You can then consume it with the one-or-more '+' quantifier because you know it's a match you want; further quantification is redundant--your positive lookahead has already assured (asserted) that. Your results are in capture group 2 "(\1+)" and you can refer to it as \2.
Note: I tested that with perl command-line utility, so that's the raw regex. It looks like you may need to escape certain characters prior to using it in the programming language you're using.
Upvotes: 1
Reputation: 174816
You may use the repetation quantifier.
Pattern p = Pattern.compile("(\\w)\\1{2,}");
Matcher m = p.matcher(tweet);
while (m.find())
{
System.out.println("Duplicate character " + m.group(1));
}
Now the duplicate character is captured by index 1 not index 0 which refers the whole match. Just change the number inside the repeatation quantifier to match the char which repeats n
or more times like "(\\w)\\1{5,}"
..
Upvotes: 3