user2936448
user2936448

Reputation: 339

Regex to match/group repeating characters in a string

I need a regular expression that will match groups of characters in a string. Here's an example string:

qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT

It should match

(match group) "result"

(1) "q"

(2) "wwwwwwwww"

(3) "eeeee"

(4) "rr"

(5) "t"

(6) "yyyyy"

(7) "qqqq"

(8) "w"

(9) "EE"

(10) "r"

(11) "TTT"

after doing some research, this is the best I could come up with

/(.)(\1*)/g

The problem I'm having is that the only way to use the \1 back-reference is to capture the character first. If I could reference the result of a non capturing group I could solve this problem but after researching I don't think it's possible.

Upvotes: 4

Views: 4422

Answers (4)

SQB
SQB

Reputation: 4078

How about /((.)(\2*))/g? That way, you match the group as a whole (I'm assuming that that's what you want, and that's what's lacking from the solution you found).

Upvotes: 4

Cruncher
Cruncher

Reputation: 7812

Since you did tag java, I'll give an alternative non-regex solution(I believe in requirements being the end product, not the method by which you get there).

String repeat = "";
char c = '';
for(int i = 0 ; i < s.length() ; i++) {
    if(s.charAt(i) == c) {
        repeat += c;
    } else {
        if(!repeat.isEmpty()) 
            doSomething(repeat); //add to an array if you want
        c = s.charAt(i);
        repeat = "" + c;
    }
}
if(!repeat.isEmpty())
    doSomething(repeat);

Upvotes: 0

Mihai Stancu
Mihai Stancu

Reputation: 16107

Assuming what @cruncher said as a premise is true: "we want to catch repeating letter groups without knowing beforehand which letter should be repeating" then:

/((a*?+)|(b*?+)|(c*?+)|(d*?+)|(e*?+)|(f*?+)|(g*?+)|(h*?+))/

The above RegEx should allow the capture of repeating letter groups without hardcoding a particular order in which they would occur.

The ?+ is a reluctant possesive quantifier which helps us not waste RAM space by not saving previously valid backtracking cases if the current case is valid.

Upvotes: 1

willkil
willkil

Reputation: 1669

Looks like you need to use a Matcher in a loop:

Pattern p = Pattern.compile("((.)\\2*)");
Matcher m = p.matcher("qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT");
while (m.find()) {
    System.out.println(m.group(1));
}

Outputs:

q
wwwwwwwww
eeeee
rr
t
yyyyy
qqqq
w
EE
r
TTT

Upvotes: 3

Related Questions