Reputation: 1192
I need to verify that a string doesn't contain any special characters like #,%...™ etc. Basically it's a Name/surname (and some similar) strings, however, sticking to [a-zA-Z] wouldn't do as symbols like ščřž... are allowed.
At the moment I'd go with somewhat like
bool NonSpecial(string text){
return !Regex.Match(Regex.Escape("!#@$%^&......")).Success;
}
but that just seems to be too complicated and clumsy.
Is there any simpler and/or more elegant way?
Update: So after reading all the replies I decided to go with
private bool IsName( string text ) {
return Regex.Match( text, @"^[\p{L}\p{Nd}'\.\- ]+$" ).Success && !Regex.Match( text, @"['\-\.]{2}" ).Success && !Regex.Match( text, " " ).Success;
}
Basically the name can contain Letters, numbers, ', ., -, and spaces, any of the ",.-" must be separeted by at least 1 other allowed characters and there cannot be 2 spaces in a row.
Hope that's correct.
Upvotes: 2
Views: 818
Reputation: 10432
Have you tried text.All(Char.IsLetter)
?
PS http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
Upvotes: 4
Reputation: 18843
try using Linq/Lambda as well pretty straight forward
will return true if it doesn't contain letters
bool result = text.Any(x => !char.IsLetter(x));
Upvotes: 0
Reputation: 54887
You can use the Unicode category for letters:
Regex.Match(text, @"\p{L}+");
See Supported Unicode Categories.
Upvotes: 2
Reputation: 415735
This problem is worse than you imagine.
There are literally thousands of allowable characters that can legitimately be part of a name, spread over hundreds of ranges in the various unicode alphabets.
There are also literally tens of thousands of characters that will never be part of a name. Think of all the emoji and ascii art characters. These are also spread over hundreds of separate ranges of unicode characters.
Sifting the wheat from the chaff via manual code, even regular expressions, just isn't going to work well.
Thankfully, this work has been done for you. Look at the char.IsLetter()
method.
You may also want to have an exception for the various allowed separator characters and accents that are not letters, but can be part of a name: hyphens, apostrophe's, and periods are legitimate, and all have more than one allowed unicode encoding. Unfortunately, I don't have a quick solution for you here. This may have to a best-effort approach, looking at just some of the more common.
Upvotes: 1