Andrew
Andrew

Reputation: 537

JavaScript regex pattern for any visible unicode letter characters

I'm working on a JavaScript application that requires me to identify the set of "any visible Unicode letter characters, digits (0-9), spaces, underscores, and periods". The suggested regex pattern is ^[0-9\\p{L} _\\.]+$, but that doesn't seem to work in JavaScript. The part that is giving me trouble is "any visible Unicode letter characters" because that includes non-English characters. Is there some JavaScript regex pattern that can identify the Unicode letter character set?

Upvotes: 5

Views: 4491

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Use XRegExp library to parse your current regular expression:

var pattern = new XRegExp("^[0-9\\p{L} _.]+$");
var s = "123 Московская Street.";
if (XRegExp.test(s, pattern)) {
    console.log("Valid");
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>

Note that ^[0-9\\p{L} _\\.]+$ matches

  • ^ - start of string
  • [0-9\\p{L} _\\.]+ - one or more chars tha are:
    • 0-9 - ASCII digits
    • \\p{L} - letters
    • - space
    • _ - an underscore
    • . - a dot (inside a character class, . matches a literal dot, no need to escape)
  • $ - end of string.

If you want to also include the following conditions:

  • Names must be at least 3 characters long and no more than 16 characters long.
  • No player name can include the word "Riot" in it.

You may extend the pattern to the following:

var pattern = new XRegExp("^(?!.*\\bRiot\\b)[0-9\\p{L} _\\.]{3,16}$");
                            ^^^^^^^^^^^^^^^^                ^^^^^^

where + (1 or more occurrences) is replaced with {3,16} limiting quantifier (3 to 16 occurrences) and (?!.*\\bRiot\\b) negative lookahead will fail the match if there is a whole word (due to \\b word boundaries) Riot is anywhere inside the string (or line, since . matches any char but line break chars).

Upvotes: 4

Related Questions