mitch
mitch

Reputation: 2245

Javascript removes letters duplicates and sequence in string

There are many posts like this and I have found a few solutions but there are not perfect. One of them:

"aabbhahahahahahahahahahahasetsetset".replace(/[^\w\s]|(.+)\1+/gi, '$1')

The results is:

abhahahahahahaset

I want to get result:

abhaset

How to do this ?

Upvotes: 2

Views: 118

Answers (2)

Martin Ender
Martin Ender

Reputation: 44259

.+ is greedy. It takes as much as it can. That is half of the has so that \1 can match the second half. Making the repetition ungreedy should do the trick:

/[^\w\s]|(.+?)\1+/gi

By the way, the i doesn't change anything here.

To get rid of nested repetitions (e.g. transform aaBBaaBB into aB (via aaBB or aBaB)) simply run the replacement multiple times until the result does not change any more.

var pattern = /[^\w\s]|(.+?)\1+/g;

var output = "aaBBaaBB";
var input;

do
{
    input = output;
    output = input.replace(pattern, "$1");
} while (input != output)

I admit the naming of output is a bit awkward for the first repetition, but you know... the two most difficult problems in computer science are cache invalidation, naming things and off-by-one errors.

Upvotes: 4

Explosion Pills
Explosion Pills

Reputation: 191729

.+ will match the maximum amount possible, so hahahaha satisfies (.+)\1 with haha and haha. You want to match the minimum amount possible, so use a reluctant quantifier.

"aabbhahahahahahahahahahahasetsetset".replace(/[^\w\s]|(.+?)\1+/gi, '$1')

http://jsfiddle.net/HQRDg/

Upvotes: 2

Related Questions