pikausp
pikausp

Reputation: 1192

C# special characters

I need to verify that a string doesn't contain any special characters like #,%...™ etc. Basically it's a Name/surname (and some similar) strings, however, sticking to [a-zA-Z] wouldn't do as symbols like ščřž... are allowed.

At the moment I'd go with somewhat like

bool NonSpecial(string text){
    return !Regex.Match(Regex.Escape("!#@$%^&......")).Success;
}

but that just seems to be too complicated and clumsy.

Is there any simpler and/or more elegant way?

Update: So after reading all the replies I decided to go with

private bool IsName( string text ) {
    return Regex.Match( text, @"^[\p{L}\p{Nd}'\.\- ]+$" ).Success && !Regex.Match( text, @"['\-\.]{2}" ).Success && !Regex.Match( text, "  " ).Success;
}

Basically the name can contain Letters, numbers, ', ., -, and spaces, any of the ",.-" must be separeted by at least 1 other allowed characters and there cannot be 2 spaces in a row.

Hope that's correct.

Upvotes: 2

Views: 818

Answers (4)

Ilya Kozhevnikov
Ilya Kozhevnikov

Reputation: 10432

Have you tried text.All(Char.IsLetter)?

PS http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

Upvotes: 4

MethodMan
MethodMan

Reputation: 18843

try using Linq/Lambda as well pretty straight forward

will return true if it doesn't contain letters

bool result = text.Any(x => !char.IsLetter(x));

Upvotes: 0

Douglas
Douglas

Reputation: 54887

You can use the Unicode category for letters:

Regex.Match(text, @"\p{L}+");

See Supported Unicode Categories.

Upvotes: 2

Joel Coehoorn
Joel Coehoorn

Reputation: 415735

This problem is worse than you imagine.

There are literally thousands of allowable characters that can legitimately be part of a name, spread over hundreds of ranges in the various unicode alphabets.

There are also literally tens of thousands of characters that will never be part of a name. Think of all the emoji and ascii art characters. These are also spread over hundreds of separate ranges of unicode characters.

Sifting the wheat from the chaff via manual code, even regular expressions, just isn't going to work well.

Thankfully, this work has been done for you. Look at the char.IsLetter() method.

You may also want to have an exception for the various allowed separator characters and accents that are not letters, but can be part of a name: hyphens, apostrophe's, and periods are legitimate, and all have more than one allowed unicode encoding. Unfortunately, I don't have a quick solution for you here. This may have to a best-effort approach, looking at just some of the more common.

Upvotes: 1

Related Questions