ManojN
ManojN

Reputation: 855

Regular Expression to replace non alpha characters with spaces

I have been trying to build a regular expression but haven't been able to get one specific condition to work.

I want a regex to remove all non alpha characters with the exception of dash (-). Dashes should only be replaced if they are prefixed by a space.

I.e.

TEST-TEST -TEST#TEST.TEST

should be changed to

TEST-TEST TEST TEST TEST

I had been using [^a-zA-Z0-9] but haven't been able to include one OR condition init.

Upvotes: 2

Views: 5024

Answers (2)

James King
James King

Reputation: 6343

// Skip over '-', grab non-word characters or the ' -' sequence to replace
string pattern = @"(?!-)(\W| -)+";  
string replacement = "";
Regex regex = new Regex(pattern);
string result = regex .Replace("Replace - this *@#&@#* string-already", replacement);

The (?!-) is a zero-width negative lookahead assertion that will skip over the '-' symbol... the second group will match it if it's preceded by a space.

If you're trying to substitute a space instead of completely removing the characters, just change to

string replacement = " ";

the pattern is greedy, so it will replace groups of non-word characters with a single space.

Upvotes: 2

SiliconChaos
SiliconChaos

Reputation: 86

Here is what I came up with (\s-|[^A-Za-z0-9-])... It will remove all non alphanumerics but keep the "-" except if there is a space before it " -"

Test using sed in Linux, at the moment I don't have access to VS or Mono to test in C#

echo "TEST-TEST -TEST#TEST.TEST -1234" | sed 's/\(\s-\|[^A-Za-z0-9-]\)/ /g'

Output

TEST-TEST TEST TEST TEST 1234
  • () and | are used for the OR condition
  • We first remove all " -" using \s-
  • next we keep all alphanumerics and "-" with [^A-Za-z0-9-]

Upvotes: 3

Related Questions