Liren Yeo
Liren Yeo

Reputation: 3451

C# Remove unwanted characters from a string

I have looked into other posts, and all of them have known unwanted characters. In my case, I have a bunch of characters that I want, and I only want to keep those.

My code is way too messy:

private string RemoveUnwantedChar(string input)
{
    string correctString = "";

    for (int i = 0; i < input.Length; i++)
    {
        if (char.IsDigit(input[i]) || input[i] == '.' || input[i] == '-' || input[i] == 'n'
                || input[i] == 'u' || input[i] == 'm' || input[i] == 'k' || input[i] == 'M'
                || input[i] == 'G' || input[i] == 'H' || input[i] == 'z' || input[i] == 'V'
                || input[i] == 's' || input[i] == '%')
            correctString += input[i];
    }
    return correctString;
}

Characters that I want: 0123456789 and numkMGHzVs%-.

Upvotes: 1

Views: 1938

Answers (5)

Jeroen van Langen
Jeroen van Langen

Reputation: 22038

You could do something like this:

// create a lookup hashset
private static HashSet<char> _allowedChars = new HashSet<char>("0123456789numkMGHzVs%-.".ToArray());

private string FilterString(string str)
{
    // tempbuffer
    char[] buffer = new char[str.Length];
    int index = 0;

    // check each character
    foreach (var ch in str)
        if (_allowedChars.Contains(ch))
            buffer[index++] = ch;

    // return the new string.
    return new String(buffer, 0, index);
}

So the trick is, create a hashset to validate each character. The 'messy' way, like you said, is creating new strings and will fragement memory. Also try to avoid many nested if statements. (like you want to avoid)


If you like linq, you could do something like:

// create a lookup hashset
private static HashSet<char> _allowedChars = new HashSet<char>("0123456789numkMGHzVs%-.".ToArray());

private string FilterString2(string str)
{
    return new String(
        str.Where(ch => _allowedChars.Contains(ch)).ToArray());
}

But this will make it less readable..

Upvotes: 2

VSDekar
VSDekar

Reputation: 1821

I like this clear and readable Regex solution.

public string RemoveUnwantedChar(string input) {
    return Regex.Replace(input, "[^0-9numkMGHzVs%\\-.]", "");
}

Upvotes: 1

w.b
w.b

Reputation: 11228

You can use LINQ:

var allowedChars = "0123456789numkMGHzVs";
var result = String.Join("", input.Where(c => allowedChars.Any(x => x == c)));

Another option:

var result = String.Join("", str.Where(c => allowedChars.Contains(c)));

Upvotes: 8

Tim Schmelter
Tim Schmelter

Reputation: 460018

You can use String.Concat + Enumerable.Where with HashSet<T>.Contains:

HashSet<char> AllowedChars = new HashSet<char>("0123456789numkMGHzVs%-.");
private string RemoveUnwantedChar(string input)
{
    return string.Concat(input.Where(AllowedChars.Contains));
}

Here's another efficient aproach using a StringBuilder and a HashSet<T>:

HashSet<char> AllowedChars = new HashSet<char>("0123456789numkMGHzVs%-.");
private string RemoveUnwantedChar(string input)
{
    StringBuilder sb = new StringBuilder(input.Length);
    foreach (char c in input)
        if (AllowedChars.Contains(c))
            sb.Append(c);
    return sb.ToString();
}

Upvotes: 3

Dhunt
Dhunt

Reputation: 1594

If you are using LINQ you could do this:

char[] validChars = "0123456789numkMGHzVs%-.".ToArray();
var newString = "Teststring012";

string filtered = string.Join("", newString.Where(x => validChars.Contains(x)));

Upvotes: 1

Related Questions