Farhad-Taran
Farhad-Taran

Reputation: 6512

Regex to allow non-ascii and foreign letters?

Is it possible to create a regular expression to allow non-ascii letters along with Latin alphabets, for example Chinese or Greek symbols(eg. A汉语AbN漢語 allowed)?

I currently have the following ^[\w\d][\w\d_\-\.\s]*$ which only allows Latin alphabets.

Upvotes: 4

Views: 3126

Answers (1)

Tim Pietzcker
Tim Pietzcker

Reputation: 336128

In .NET,

^[\p{L}\d_][\p{L}\d_.\s-]*$

is equivalent to your regex, additionally allowing other Unicode letters.

Explanation:

\p{L} is a shorthand for the Unicode property "Letter".

Caveat: I think you wanted to not allow the underscore as initial character (evidenced by its presence only in the second character class). Since \w includes the underscore, your regex did allow it, though. You might want to remove it from the first character class in my solution (it's not included in \p{L}, of course).

In ECMAScript, things are not so easy. You would have to define your own Unicode character ranges. Fortunately, a fellow StackOverflow user has already risen to the occasion and designed a JavaScript regex converter:

https://stackoverflow.com/a/8933546/20670

Upvotes: 6

Related Questions