user2259448
user2259448

Reputation: 43

How to create a regex that replaces two or more consecutive identical characters with only two?

Assuming you have a twitter message that is similar to the following:

"Hoot, this is soooooo coooool!!!"

I want to come up with a Java regex so that String.replaceAll will result in the following:

"Hoot, this is so cool!"

I started and have tried permuting the following without success:

original.replaceAll("(.)\\1+", "$1");

Does someone know how to come up with a regex that will greedily reduce several consecutive characters to two characters? The solution must not reduce two repeating characters to one (e.g. the word hoot should not reduce to hot).

Upvotes: 2

Views: 953

Answers (2)

CrayonViolent
CrayonViolent

Reputation: 32532

With pure regex, the best you will get is what dasblinkenlight showed, but your issue extends beyond simply replacing 2+ chars with those 2 chars. What you really want is for it to strip extra repetition for correct spelling of words, given the context of the word.

Examples:

  • "this is sooooo cool" to be reduced to "so", not "soo" - strip 1+ to 1
  • "this is so cooooool" to be reduced to "cool" - strip 2+ to 2
  • "this is hooooot" to be reduced to "hot" - strip 1+ to 1 because the intention is the word "hot" not "hoot"
  • "What a hooooooot" to be reduced to "hoot" - strip 2+ to 2 because in this context, the intention is "hoot" not "hot"

There is no pure regex solution for this. Regex cannot do spell and grammar checking.

Upvotes: 1

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726619

If you need to replace 2+ characters with exactly two, you can slightly modify your expression, like this:

original.replaceAll("(.)\\1+", "$1$1");

However, there is not enough information in a regex to make an exception for "soooooo" and trim it to "so", as opposed to "soo".

Here is a demo on ideone.

Upvotes: 2

Related Questions