Kaushik
Kaushik

Reputation: 255

Regex to match two consecutive characters unless followed/preceded by more of the same character

I have a string that looks like this: qqq Eqq Eqqq Cqq Eqq Fq. I want to replace all sequences where there are two consecutive characters (in this case qq) with an h, with the desired output looking like this: qqq Eh Eqqq Ch Eh Fq

However, I do not want the regex to match sequences of more than two q's (qqq or qqqq) and make the string look like this: hq Eh Ehq Ch Eh Fq. I've tried the below but that results in the output I don't want.

text = "qqq Eqq Eqqq Cqq Eqq Fq";
text = text.replaceAll("[q]{2}", "h");

I've also tried only replacing q's followed by a whitespace character but this just ends up matching the last two q's in each word. Is there a way to replace two consecutive characters unless they are followed by a third or fourth of that same character? The language is Java if that helps.

Upvotes: 2

Views: 2643

Answers (2)

Jakub Dąbek
Jakub Dąbek

Reputation: 1044

If you want to match any character, not a specific one, you have to use something quite convoluted as Java's regex doesn't support variable-length look-behinds. I came up with this:

(?!([a-z])\1\1)(.)([a-z])\3(?!\3)

Explanation:

(?!            # negative look-ahead
  ([a-z])\1\1  # [group 1] match a letter and 2 more of the same letter
)              # end of the negative look-ahead
(.)            # [group 2] match any character - this is for some other character 
               # before what you want ('E', 'C', or 'F' in your examples)
               # this will not match the repeated character -
               # guaranteed by the previous negative look-ahead
([a-z])        # [group 3] the letter to be replaced
\3             # the same letter (reference to the previous group)
(?!\3)         # negative look-ahead - 
               # makes the pattern not match more than 2 of the same character

You have to replace with $2h ($2 is (.) in the pattern)

Java demo, regex101 demo

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

You may use a lookaround based regex:

String text = "qqq Eqq Eqqq Cqq Eqq Fq";
text = text.replaceAll("(?<!q)q{2}", "h");
System.out.println(text);
// => hq Eh Ehq Ch Eh Fq

See the Java demo and a regex demo.

Details

  • (?<!q) - a negative lookbehind that fails the match if there is a q immediately to the left of the current location
  • q{2} - 2 q chars.

Note: if you plan to only replace 2 q chars not surrounded with qs, add a negative lookahead (?!q) at the end, "(?<!q)q{2}(?!q)".

Upvotes: 5

Related Questions