Reputation: 537
I'm working on a JavaScript application that requires me to identify the set of "any visible Unicode letter characters, digits (0-9), spaces, underscores, and periods". The suggested regex pattern is ^[0-9\\p{L} _\\.]+$
, but that doesn't seem to work in JavaScript. The part that is giving me trouble is "any visible Unicode letter characters" because that includes non-English characters. Is there some JavaScript regex pattern that can identify the Unicode letter character set?
Upvotes: 5
Views: 4491
Reputation: 626845
Use XRegExp
library to parse your current regular expression:
var pattern = new XRegExp("^[0-9\\p{L} _.]+$");
var s = "123 Московская Street.";
if (XRegExp.test(s, pattern)) {
console.log("Valid");
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.2.0/xregexp-all.min.js"></script>
Note that ^[0-9\\p{L} _\\.]+$
matches
^
- start of string[0-9\\p{L} _\\.]+
- one or more chars tha are:
0-9
- ASCII digits\\p{L}
- letters
- space_
- an underscore .
- a dot (inside a character class, .
matches a literal dot, no need to escape)$
- end of string.If you want to also include the following conditions:
You may extend the pattern to the following:
var pattern = new XRegExp("^(?!.*\\bRiot\\b)[0-9\\p{L} _\\.]{3,16}$");
^^^^^^^^^^^^^^^^ ^^^^^^
where +
(1 or more occurrences) is replaced with {3,16}
limiting quantifier (3 to 16 occurrences) and (?!.*\\bRiot\\b)
negative lookahead will fail the match if there is a whole word (due to \\b
word boundaries) Riot
is anywhere inside the string (or line, since .
matches any char but line break chars).
Upvotes: 4