Brian
Brian

Reputation: 197

Regex remove sequential duplicates from a space-delimited string

I'm trying to remove (from a string) only the duplicates that occur sequentially. That is, given the string "1 2 3 3 2 1" only one of the 3's should be removed (i.e. "1 2 3 2 1"). I really thought I had it figured out. And then, during testing, I found a case where it didn't work. I've tried every combination I could think of, to no avail. Surely it's something simple, as it's not a hard match to do (except for me, obviously).

Following is some Javascript to illustrate the problem. The first testVal string is handled correctly. The commented-out testVal string is not handled correctly.

// The following string should reduce to: MTC MTCA MTC ORD MTC (it does).
var testVal = "MTC MTC MTCA MTC MTC MTC ORD MTC";

// The following string should reduce to: MTC (it does not.  Result = MTC MTC).
// The string MTC MTC MTC MTC also only reduces to MTC MTC, so I'm thinking
// it's a whitespace issue.
// var testVal = "MTC MTC";

while (/\b(\s*\w+\s*)\b\1/.test(testVal)) {
    testVal = testVal.replace(/\b(\s*\w+\s*)\b\1/g,'$1');
}

alert(testVal1);

Upvotes: 1

Views: 333

Answers (1)

Neil
Neil

Reputation: 55392

You are including the whitespace as part of the word to be matched twice. Try

/\b(\w+)\s+\1\b/

Upvotes: 1

Related Questions