jack3604
jack3604

Reputation: 93

Removing all non letter characters from a string in C#

I want to remove all non letter characters from a string. When I say all letters I mean anything that isn't in the alphabet, or an apostrophe. This is the code I have.

public static string RemoveBadChars(string word)
{
    char[] chars = new char[word.Length];
    for (int i = 0; i < word.Length; i++)
    {
        char c = word[i];

        if ((int)c >= 65 && (int)c <= 90)
        {
            chars[i] = c;
        }
        else if ((int)c >= 97 && (int)c <= 122)
        {
            chars[i] = c;
        }
        else if ((int)c == 44)
        {
            chars[i] = c;
        }
    }

    word = new string(chars);

    return word;
}

It's close, but doesn't quite work. The problem is this:

[in]: "(the"
[out]: " the"

It gives me a space there instead of the "(". I want to remove the character entirely.

Upvotes: 7

Views: 14321

Answers (6)

Richard Keene
Richard Keene

Reputation: 402

word.Aggregate(new StringBuilder(word.Length), (acc, c) => acc.Append(Char.IsLetter(c) ? c.ToString() : "")).ToString();

Or you can substitute whatever function in place of IsLetter.

Upvotes: 0

Adel Mourad
Adel Mourad

Reputation: 1547

This is the working answer, he says he want to remove none-letters chars

public static string RemoveNoneLetterChars(string word)
{
    Regex reg = new Regex(@"\W");
    return reg.Replace(word, " "); // or return reg.Replace(word, String.Empty); 
}

Upvotes: 2

Dan
Dan

Reputation: 1001

You should use Regular Expression (Regex) instead.

public static string RemoveBadChars(string word)
{
    Regex reg = new Regex("[^a-zA-Z']");
    return reg.Replace(word, string.Empty);
}

If you don't want to replace spaces:

Regex reg = new Regex("[^a-zA-Z' ]");

Upvotes: 6

Joel Coehoorn
Joel Coehoorn

Reputation: 415840

private static Regex badChars = new Regex("[^A-Za-z']");

public static string RemoveBadChars(string word)
{
    return badChars.Replace(word, "");
}

This creates a Regular Expression that consists of a character class (enclosed in square brackets) that looks for anything that is not (the leading ^ inside the character class) A-Z, a-z, or '. It then defines a function that replaces anything that matches the expression with an empty string.

Upvotes: 2

Brandon Spilove
Brandon Spilove

Reputation: 1569

A regular expression would be better as this is pretty inefficient, but to answer your question, the problem with your code is that you should use a different variable other than i inside your for loop. So, something like this:

public static string RemoveBadChars(string word)
{
    char[] chars = new char[word.Length];
    int myindex=0;
    for (int i = 0; i < word.Length; i++)
    {
        char c = word[i];

        if ((int)c >= 65 && (int)c <= 90)
        {
            chars[myindex] = c;
            myindex++;
        }
        else if ((int)c >= 97 && (int)c <= 122)
        {
            chars[myindex] = c;
            myindex++;
        }
        else if ((int)c == 44)
        {
            chars[myindex] = c;
            myindex++;
        }
    }

    word = new string(chars);

    return word;
}

Upvotes: 2

Grant Winney
Grant Winney

Reputation: 66449

The Char class has a method that could help out. Use Char.IsLetter() to detect valid letters (and an additional check for the apostrophe), then pass the result to the string constructor:

var input = "(the;':";

var result = new string(input.Where(c => Char.IsLetter(c) || c == '\'').ToArray());

Output:

the'

Upvotes: 10

Related Questions