Reputation: 193
In C# code, I am trying to pass chinese characters: " 中文ABC123"
.
When I use alphanumeric in general using "^[a-zA-Z0-9\s]+$"
,
it doesn't pass for "中文ABC123"
and regex validation fails.
What other expressions do I need to add for C#?
Upvotes: 19
Views: 20345
Reputation: 7263
Thanks to @Andie2302 for pointing to the right way to do it.
In Addition, for many language in the world, it's still has the 'addition character' that require main character to generate it (ex. Thai word 'เก็บ' if use only \p{L} it will display only 'เกบ', you can see that some symbolic will be missing from the word).
That's why only \p{L}
will not work for all foreign language.
So, you need to use code below, to support almost foreign language
\p{L}\p{M}
NOTE:
L stand for 'Letter' (All letter from all language, but does not include the 'Mark')
M stand for 'Mark' (The 'Mark' cannot display alone, it require 'Letter' to display it)
In Addition that you need Number, use code below
\p{N}
NOTE:
N stand for 'Numeric'
Thanks to this website for very useful information
https://www.regular-expressions.info/unicode.html
Upvotes: 3
Reputation: 4887
To match any letter character from any language use:
\p{L}
If you also want to match numbers:
[\p{L}\p{Nd}]+
\p{L}
... matches a character of the unicode category letter.
it is the short form for [\p{Ll}\p{Lu}\p{Lt}\p{Lm}\p{Lo}]
\p{Ll}
... matches lowercase letters. (abc)
\p{Lu}
... matches uppercase letters. (ABC)
\p{Lt}
... matches titlecase letters.
\p{Lm}
... matches modifier letters.
\p{Lo}
... matches letters without case. (中文)
\p{Nd}
... matches a character of the unicode category decimal digit.
Just replace: ^[a-zA-Z0-9\s]+$
with ^[\p{L}0-9\s]+$
Upvotes: 44