Reputation: 43
Assuming you have a twitter message that is similar to the following:
"Hoot, this is soooooo coooool!!!"
I want to come up with a Java regex so that String.replaceAll will result in the following:
"Hoot, this is so cool!"
I started and have tried permuting the following without success:
original.replaceAll("(.)\\1+", "$1");
Does someone know how to come up with a regex that will greedily reduce several consecutive characters to two characters? The solution must not reduce two repeating characters to one (e.g. the word hoot should not reduce to hot).
Upvotes: 2
Views: 953
Reputation: 32532
With pure regex, the best you will get is what dasblinkenlight showed, but your issue extends beyond simply replacing 2+ chars with those 2 chars. What you really want is for it to strip extra repetition for correct spelling of words, given the context of the word.
Examples:
There is no pure regex solution for this. Regex cannot do spell and grammar checking.
Upvotes: 1
Reputation: 726619
If you need to replace 2+ characters with exactly two, you can slightly modify your expression, like this:
original.replaceAll("(.)\\1+", "$1$1");
However, there is not enough information in a regex to make an exception for "soooooo"
and trim it to "so"
, as opposed to "soo"
.
Here is a demo on ideone.
Upvotes: 2