mohanlon
mohanlon

Reputation: 43

Need regex to replace all symbols surround by a letters or numbers only

I need a regex to replace all symbols surround by a letters or numbers only. With a space, I'll be using C# to run the expression and I'm OK with the part just stuck on the regex part.

So after the replacement the following

  1. Type-01 would be Type 01
  2. 01)* would still be 01)*
  3. -Category:Toys would still be -Category:Toys
  4. White:Back would be White Black

Current Expression

(?<=\w)[^a-zA-Z0-9Category:]+(?=\w)

Input string is

-Category:Toys AND (Teddy Bear Type-01*) OR (Teddy Bear White:Black)

Required output

-Category:Toys AND (Teddy Bear Type 01*) OR (Teddy Bear White Black)

But what I'm getting is

-Category:Toys AND Teddy Bear Type 01 OR Teddy Bear White:Black)

Not sure if I'm just missing some thing simple or just got the wrong end of the stick

Upvotes: 4

Views: 923

Answers (2)

Stephen Walker
Stephen Walker

Reputation: 574

For C#, you can use the Regex.Replace function.

string a = "Category:Toys AND (Teddy Bear Type-01*) OR (Teddy Bear White/Black)";
string s = string.Empty;
s = Regex.Replace(a, @"[^()*:A-Za-z0-9]", " ");

Upvotes: 0

stema
stema

Reputation: 92986

You can't put words into a character class. All characters there will be added to that class, the order doesn't matter.

I am not sure if it is sufficient for you, but for your example, this will work:

(?<=\w)[^a-zA-Z0-9*:()\s]+(?=\w)

and replace with a single space.

I would make it also more Unicode style:

(?<=\w)[^\p{L}0-9*:()\s]+(?=\w)

Where \p{L} is a Unicode property for a letter in any language.

See it here on Regexr

Update:

If you want to keep the colon if there is "Category" before you could do it like this

(?<=\w)(?:[^a-zA-Z0-9*()\s:]+|(?<!Category):)(?=\w)

See it on Regexr

I added the colon to the negated character class to say don't replace the colon. Then I added an alternative to say: replace the colon, but only if there is not "Category" before.

Upvotes: 2

Related Questions