Shiju Shaji
Shiju Shaji

Reputation: 1730

How to replace all unwanted characters in a string using RegEx?

In a c# application i need to replace all unwanted characters with "Ã". Following is the allowed character array.

string[] wantedCharacters = new string[] { " ", "!", "\"", "#", "$", "%", "&", "\'", "(", ")", "*", "+", ",", "-", ".", "/", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", ":", ";", "<", "=", ">", "?", "@", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "[", "\\", "]", "^", "_", "`", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "{", "|", "}", "~" };

All the characters other than this should be replaced using "Ã". I have done it with Loopin all the string characters. But it's taking too much time to execute. I looking for a regular expression to do this. Any help will be appreciated.

Upvotes: 3

Views: 9207

Answers (3)

MarcinJuraszek
MarcinJuraszek

Reputation: 125660

[^c] means: everything that is not c. You should replace c with your allowed character and use that regex to replace method:

var reg = new Regex(@"[^ !""#$%&'()*+,-./0-9:;<=>?@A-Z\[\\\]^_`a-z{|}~]");
var result = reg.Replace(inputString, "Ã");

Upvotes: 4

nhahtdh
nhahtdh

Reputation: 56829

It seems that you are trying to restrict the characters to the printable characters in ASCII (characters with code 0x20 to 0x7E). So you can use this regex:

[^\x20-\x7E]

The regex will match all unwanted characters.

Putting the regex above in literal string:

@"[^\x20-\x7E]"

Use this regex with Replace function and replace with empty string to remove all unwanted characters, or replace with some placeholder character of your choice.

Upvotes: 4

abatishchev
abatishchev

Reputation: 100348

I would not use RegEx, it will be less readable.

string input "..";
HashSet<char> wantedCharactersSet = new HashSet<char>(wantedCharacters);
for (int i = 0; i < input.Length; i++)
{
    if (!wantedCharactersSet.Contains(input[i]))
        input[i] = placeholderChar;
}

Notice that HashSet<T>.Contains() has performance O(1) while Array just n.

Upvotes: 4

Related Questions