coderman
coderman

Reputation: 1514

Remove a set of characters using Regex including the space character doesn't work

Currently I am using a StringBuilder to remove a list of characters from a string as below

char[] charArray = { 
  '%', '&', '=', '?', '{', '}', '|', '<', '>', 
  ';', ':', ',', '"', '(', ')', '[', ']', '\\', 
  '/', '*', '+', ' ' };

// Remove special characters that aren't allowed

var sanitizedAddress = new StringBuilder();
foreach (var character in emailAddress.ToCharArray())
{
  if (Array.IndexOf(charArray, character) < 0)
    sanitizedAddress.Append(character);
}

I tried to use Regex for the same as follows

var invalidCharacters = Regex.Escape(@"%&=?{}|<>;:,\"()[]\\/*+\s");
emailAddress = Regex.Replace(emailAddress, invalidCharacters, "");

Upvotes: 1

Views: 67

Answers (2)

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186678

You can try using Linq (in order to filter out the unwanted characters with a help of Where) instead of Regular Expressions:

using System.Linq;

...

// Hash set is faster on Contains operation than array - O(1) vs. O(N)
HashSet<char> toRemove = new HashSet<char>() { 
    '%', '&', '=', '?', '{', '}', '|', '<', '>', 
    ';', ':', ',', '"', '(', ')', '[', ']', '\\', 
    '/', '*', '+', ' ' };

string emailAddress = ...

string emailAddress = string.Concat(emailAddress
  .Where(c => !toRemove.Contains(c)));

You can add more Where e.g.

string emailAddress = string.Concat(emailAddress
  .Where(c => !toRemove.Contains(c))
  .Where(c => !char.IsWhiteSpace(c))); // get rid of white spaces as well

In case you insist on regular expressions you have to build the pattern, e.g.:

  char[] charArray = {
    '%', '&', '=', '?', '{', '}', '|', '<', '>',
    ';', ':', ',', '"', '(', ')', '[', ']', '\\',
    '/', '*', '+', ' ' };

  // Joined with | ("or" in regular expressions) all the characters (escaped!)
  string pattern = string.Join("|", charArray
    .Select(c => Regex.Escape(c.ToString())));

And then you can Replace:

  string emailAddress = Regex.Replace(emailAddress, pattern, "");

Upvotes: 1

qbik
qbik

Reputation: 5908

You can use character set [...] for this:

var invalidCharacters = "[" + Regex.Escape(@"%&=?{}|<>;:,""()\*/+") + @"\]\[\s]";
emailAddress = Regex.Replace(emailAddress, invalidCharacters, "");

Some side notes:

  • when using double quote in "at string", you should use "", not \"
  • \s is alread an escaped sequence, so Regex.Escape will render \\s, which is not what you wanted
  • Regex.Escape don't seem to escape ] character correctly - that's why it's added separately

Upvotes: 1

Related Questions