Reputation: 339
I need a regular expression that will match groups of characters in a string. Here's an example string:
qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT
It should match
(match group) "result"
(1) "q"
(2) "wwwwwwwww"
(3) "eeeee"
(4) "rr"
(5) "t"
(6) "yyyyy"
(7) "qqqq"
(8) "w"
(9) "EE"
(10) "r"
(11) "TTT"
after doing some research, this is the best I could come up with
/(.)(\1*)/g
The problem I'm having is that the only way to use the \1
back-reference is to capture the character first. If I could reference the result of a non capturing group I could solve this problem but after researching I don't think it's possible.
Upvotes: 4
Views: 4422
Reputation: 4078
How about /((.)(\2*))/g
? That way, you match the group as a whole (I'm assuming that that's what you want, and that's what's lacking from the solution you found).
Upvotes: 4
Reputation: 7812
Since you did tag java, I'll give an alternative non-regex solution(I believe in requirements being the end product, not the method by which you get there).
String repeat = "";
char c = '';
for(int i = 0 ; i < s.length() ; i++) {
if(s.charAt(i) == c) {
repeat += c;
} else {
if(!repeat.isEmpty())
doSomething(repeat); //add to an array if you want
c = s.charAt(i);
repeat = "" + c;
}
}
if(!repeat.isEmpty())
doSomething(repeat);
Upvotes: 0
Reputation: 16107
Assuming what @cruncher said as a premise is true: "we want to catch repeating letter groups without knowing beforehand which letter should be repeating" then:
/((a*?+)|(b*?+)|(c*?+)|(d*?+)|(e*?+)|(f*?+)|(g*?+)|(h*?+))/
The above RegEx should allow the capture of repeating letter groups without hardcoding a particular order in which they would occur.
The ?+
is a reluctant possesive quantifier which helps us not waste RAM space by not saving previously valid backtracking cases if the current case is valid.
Upvotes: 1
Reputation: 1669
Looks like you need to use a Matcher in a loop:
Pattern p = Pattern.compile("((.)\\2*)");
Matcher m = p.matcher("qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT");
while (m.find()) {
System.out.println(m.group(1));
}
Outputs:
q
wwwwwwwww
eeeee
rr
t
yyyyy
qqqq
w
EE
r
TTT
Upvotes: 3