Reputation: 15070
I have a working regex that matches ASCII alphanumeric characters:
string pattern = "^[a-zA-Z0-9]+$";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
...
I want to extend this to apply the same concept, but include all latin characters (e.g. å, Ø etc).
I've read about unicode scripts. And I've tried this:
string pattern = "^[{Latin}0-9]+$";
But it's not matching the patterns I expect. How do I match latin unicode using unicode scripts or an alternative method?
Upvotes: 3
Views: 2289
Reputation: 48711
Unicode scripts are not supported by .NET regex engine but Unicode blocks are. Having that said, you are able to match all latin characters using below regex:
^[\p{IsBasicLatin}\p{IsLatin-1Supplement}\p{IsLatinExtended-A}\p{IsLatinExtended-B}0-9]+$
\p{IsBasicLatin}
: U+0000–U+007F\p{IsLatin-1Supplement}
: U+0080–U+00FF\p{IsLatinExtended-A}
: U+0100–U+017F\p{IsLatinExtended-B}
: U+0180–U+024For simply use ^[\u0000-\u024F0-9]+$
.
Mentioned by @AnthonyFaull you may want to consider matching \p{IsLatinExtendedAdditional}
as well which is a named block for U+1E00-U+1EFF that contains 256 additional characters:
[ắẮằẰẵẴẳẲấẤầẦẫẪẩẨảẢạ ẠặẶậẬḁḀ ẚ ḃḂḅḄḇḆ ḉḈ ḋḊḑḐḍḌḓḒḏḎ ẟ ếẾềỀễỄểỂẽẼḝḜḗḖḕḔẻẺẹẸ ệỆḙḘḛḚ ḟḞ ḡḠ ḧḦḣḢḩḨḥḤḫḪẖ ḯḮỉỈịỊḭḬ ḱḰḳḲḵḴ ḷḶḹḸḽḼḻḺ ỻỺ ḿḾṁṀṃṂ ṅṄṇṆṋṊṉṈ ốỐồỒỗỖổỔṍṌṏṎṓṒṑṐỏỎớỚ ờỜỡỠởỞợỢọỌộỘ ṕṔṗṖ ṙṘṛṚṝṜṟṞ ṥṤṧṦṡṠṣṢṩṨẛ ẞ ẜ ẝ ẗṫṪṭṬṱṰṯṮ ṹṸṻṺủỦứỨừỪữỮửỬựỰụỤṳṲ ṷṶṵṴ ṽṼṿṾ ỽỼ ẃẂẁẀẘẅẄẇẆẉẈ ẍẌẋẊ ỳỲẙỹỸẏẎỷỶỵỴ ỿỾ ẑẐẓẒẕẔ]
Upvotes: 5
Reputation: 128
I will use unicode scripts.
As describe by Wikipedia (https://en.wikipedia.org/wiki/Latin_script_in_Unicode), I will use Latin-1 Supplement (00C0-00FF), Latin Extended-A (0100–017F), Latin Extended-B (0180–024F) and your pattern for ASCII alphanumeric characters.
string pattern = "^[a-zA-Z0-9\\u00C0–\\u024F]+$";
Upvotes: 1
Reputation: 3627
Use ^[\p{L}\s]+$
to match any unicode character
Or ^[\w\u00c0-\u017e]$
to match any letter plus unicode characters from 00c0 to 017e (use charmap to find unicode characters range you need)
Upvotes: 2