Reputation: 13290
My application was relying on this function to test if a string is Korean or not :
const isKoreanWord = (input) => {
const match = input.match(/[\u3131-\uD79D]/g);
return match ? match.length === input.length : false;
}
isKoreanWord('만두'); // true
isKoreanWord('mandu'); // false
until I started to include Chinese support and now this function is incoherent :
isKoreanWord('幹嘛'); // true
I believe this is caused by the fact that Korean characters and Chinese ones are intermingled into the same Unicode range.
How should I correct this function to make it returns true
if the input contains only Korean characters ?
Upvotes: 7
Views: 5097
Reputation: 13014
In modern browsers, you can use unicode character classes directly:
const RE = /\p{sc=Hangul}/u
console.log(RE.test('만두')) // true
console.log(RE.test('mandu')) // false
console.log(RE.test('幹嘛')) // false
Upvotes: 1
Reputation: 6324
a shorter version that matches korean characters
const regexKorean = /[\u1100-\u11FF\u3130-\u318F\uA960-\uA97F\uAC00-\uD7AF\uD7B0-\uD7FF]/g
Upvotes: 2
Reputation: 2395
Here is the unicode range you need for Hangul (Taken from their wikipedia page).
U+AC00–U+D7AF
U+1100–U+11FF
U+3130–U+318F
U+A960–U+A97F
U+D7B0–U+D7FF
So your regex .match
should look like this:
const match = input.match(/[\uac00-\ud7af]|[\u1100-\u11ff]|[\u3130-\u318f]|[\ua960-\ua97f]|[\ud7b0-\ud7ff]/g);
Upvotes: 16