Reputation: 4237
I use the search()
method of the string object to find a match between a regular expression and a string.
It works fine for English words:
"google".search(/\bg/g) // return 0
But this code doesn't work for Japanese strings:
"アイスランド語".search(/\bア/g) // return -1
How can I change the regex to find a match between Japanese strings and a regular expression?
Upvotes: 4
Views: 2603
Reputation: 111890
Sadly Javascript is an "ASCII only" regex. No Unicode is supported (I mean that the Unicode non-ASCII characters aren't "divided into classes". So \d
is only 0-9 for example). If you need advanced regexes (Unicode regexes) in Javascript, you can try http://xregexp.com/
And we won't even delve in the problem of surrogate pairs. A character in Javascript is an UTF-16 point, so it isn't always a "full" Unicode character. Fortunately Japanese should entirely be in the BMP (but note that the Han unification is in the Plane 2, so each of those character is 2x UTF-16 characters)
If you want to read something about Unicode, you could start from the Wiki Mapping of Unicode characters for example.
Upvotes: 4
Reputation: 54649
The problem is the \b
. As \b
only matches:
(see: http://www.regular-expressions.info/wordboundaries.html)
And in JavaScript a word character is the character class [a-zA-Z0-9_]
(ref / Word Boundaries / ECMA = ASCII).
Upvotes: 3