Gabe
Gabe

Reputation: 50493

Filter a String

I want to make sure a string has only characters in this range

[a-z] && [A-Z] && [0-9] && [-]

so all letters and numbers plus the hyphen. I tried this...

C# App:

        char[] filteredChars = { ',', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '{', '}', '[', ']', ':', ';', '"', '\'', '?', '/', '.', '<', '>', '\\', '|' };
        string s = str.TrimStart(filteredChars);

This TrimStart() only seems to work with letters no otehr characters like $ % etc

Did I implement it wrong? Is there a better way to do it?

I just want to avoid looping through each string's index checking because there will be a lot of strings to do...

Thoughts?

Thanks!

Upvotes: 19

Views: 47020

Answers (6)

Tore Aurstad
Tore Aurstad

Reputation: 3816

I have tested these two solutions in Linqpad 5. The benefit of these is that they can be used not only for integers, but also decimals / floats with a number decimal separator, which is culture dependent. For example, in Norway we use the comma as the decimal separator, whereas in the US, the dot is used. The comma is used there as a thousands separator. Anyways, first the Linq version and then the Regex version. The most terse bit is accessing the Thread's static property for number separator, but you can compress this a bit using static at the top of the code, or better - put such functionality into C# extension methods, preferably having overloads with arbitrary Regex patterns.

string crappyNumber = @"40430dfkZZZdfldslkggh430FDFLDEFllll340-DIALNOWFORCHRISTSAKE.,CAKE-FORFIRSTDIAL920932903209032093294faøj##R#KKL##K";

string.Join("", crappyNumber.Where(c => char.IsDigit(c)|| c.ToString() == Thread.CurrentThread.CurrentCulture.NumberFormat.NumberDecimalSeparator)).Dump();

new String(crappyNumber.Where(c => new Regex($"[\\d]+{Thread.CurrentThread.CurrentUICulture.NumberFormat.NumberDecimalSeparator}\\d+").IsMatch(c.ToString())).ToArray()).Dump();

Note to the code above, the Dump() method dumps the results to Linqpad. Your code will of course skip this very last part. Also note that we got it down to a one liner, but it is a bit verbose still and can be put into C# extension methods as suggested.

Also, instead of string.join, newing a new String object is more compact syntax and less error prone.

We got a crappy number as input, but we managed to get our number in the end! And it is Culture aware in C#!

Upvotes: 0

Judah Gabriel Himango
Judah Gabriel Himango

Reputation: 60021

Here's a fun way to do it with LINQ - no ugly loops, no complicated RegEx:

private string GetGoodString(string input)
{
   var allowedChars = 
      Enumerable.Range('0', 10).Concat(
      Enumerable.Range('A', 26)).Concat(
      Enumerable.Range('a', 26)).Concat(
      Enumerable.Range('-', 1));

   var goodChars = input.Where(c => allowedChars.Contains(c));
   return new string(goodChars.ToArray());
}

Feed it "Hello, world? 123!" and it will return "Helloworld123".

Upvotes: 18

Tomas Aschan
Tomas Aschan

Reputation: 60594

This seems like a perfectly valid reason to use a regular expression.

bool stringIsValid = Regex.IsMatch(inputString, @"^[a-zA-Z0-9\-]*?$");

In response to miguel's comment, you could do this to remove all unwanted characters:

string cleanString = Regex.Replace(inputString, @"[^a-zA-Z0-9\-]", "");

Note that the caret (^) is now placed inside the character class, thus negating it (matching any non-allowed character).

Upvotes: 42

JaredPar
JaredPar

Reputation: 754763

Try the following

public bool isStringValid(string input) {
  if ( null == input ) { 
    throw new ArgumentNullException("input");
  }
  return System.Text.RegularExpressions.Regex.IsMatch(input, "^[A-Za-z0-9\-]*$");
}

Upvotes: 3

Joel
Joel

Reputation: 16655

I'm sure that with a bit more time you can come up wiht something better, but this will give you a good idea:

public string NumberOrLetterOnly(string s)
{
    string rtn = s;
    for (int i = 0; i < s.Length; i++)
    {
        if (!char.IsLetterOrDigit(rtn[i]) && rtn[i] != '-')
        {
            rtn = rtn.Replace(rtn[i].ToString(), " ");
        }
    }
    return rtn.Replace(" ", "");
}

Upvotes: 1

miguel
miguel

Reputation: 3009

Why not just use replace instead? Trimstart will only remove the leading characters in your list...

Upvotes: 3

Related Questions