j2query
j2query

Reputation: 13

Delete duplicate word in a line using a regular expression

I want to delete duplicate word in a line.

For example:

arraythis1, XdashedSmall, Small, Medium, Large, XdashedLarge, XdashedSmall, Small, Medium, Large, XdashedLarge

I want to remove all of the duplicated items, turning the line into this:

arraythis1, XdashedSmall, Small, Medium, Large

My regex is like this: \w(\D+)(?:,\s+\1\b,)+/gm, See regex101.

Upvotes: 1

Views: 72

Answers (3)

Ajay
Ajay

Reputation: 6590

I think you should try this

var words = new HashSet<string>();
string text = "arraythis1, XdashedSmall, Small, Medium, Large, XdashedLarge, XdashedSmall, Small, Medium, Large, XdashedLarge";
text = Regex.Replace(text, "\\w+", m =>
                 words.Add(m.Value.ToUpperInvariant())
                     ? m.Value
                     : String.Empty);

Upvotes: 0

vks
vks

Reputation: 67968

(\b[^,]+),(?=.*\b\1\b)

Try this.Replace by empty string.See demo.

https://regex101.com/r/sJ9gM7/6

Upvotes: 1

Tim Groeneveld
Tim Groeneveld

Reputation: 9019

I am not sure of your exact input, but given this example, if you just want to remove the first "arraythis1", you can just use this regular expression:

   ^[^\,]*
  • The first carrot ("^") says "start at the front of the line".
  • The square brackets ("[]") says to match a single character not present in the list (which is in the square brackets).
  • I reverse the square brackets (instead of matching a single character, don't match a single character) by using another carrot in front of the list.
  • and finally, I use an asterisk ("*") to make sure that I get all of the characters that are not a comma by making sure that I capture them all.

Then finally, to make the last of your regular expression, you will want to remove the remaining space (or spaces).

^[^\,]*,\s+

See https://regex101.com/r/oV2aO0/2

Upvotes: 0

Related Questions