Reputation: 2173
I need a way to check whether a string contains Japanese or Chinese text.
Currently I'm using this:
string.match(/[\u3400-\u9FBF]/);
but it does not work with this for example: ディアボリックラヴァーズ
or バッテリー
.
Could you help me with that?
Thanks
Upvotes: 17
Views: 28259
Reputation: 726
This may help if you need to differentiate between CKJ languages:
function detectCJKLanguage(str) {
// Chinese characters range
const chineseRegex = /[\u4E00-\u9FFF]/;
// Japanese characters range
const japaneseRegex = /[\u3040-\u30FF\u31F0-\u31FF\uFF00-\uFFEF]/;
// Korean characters range
const koreanRegex = /[\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF]/;
// Check if the string contains any CJK characters
if (chineseRegex.test(str)) {
return { isCJK: true, language: "Chinese" };
} else if (japaneseRegex.test(str)) {
return { isCJK: true, language: "Japanese" };
} else if (koreanRegex.test(str)) {
return { isCJK: true, language: "Korean" };
} else {
return { isCJK: false, language: "Non-CJK" };
}
}
Upvotes: 0
Reputation: 542
You can use this code and it's works for me.
let str = "渣打銀行提供一系列迎合你生活需要嘅信用卡";
//let str = "SGGRAND DING HOUSE 4GRAND DING HOUSE";
const REGEX_CHINESE = /[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/;
const hasChinese = str.match(REGEX_CHINESE);
if(hasChinese){
alert("Found");
}
else{
alert("Not Found");
}
Upvotes: 4
Reputation: 163
swift 4, changed the pattern to and NSRegularExpression for replace, maybe might help someone!
[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]
extension method
mutating func removeRegexMatches(pattern: String, replaceWith: String = "") {
do {
let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, self.count)
self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
} catch {
return
}
}
mutating func removeEastAsianChars() {
let regexPatternEastAsianCharacters = "[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]"
removeRegexMatches(pattern: regexPatternEastAsianCharacters)
}
example, string result is ABC
"ABC検診センター".removeEastAsianChars()
Upvotes: 4
Reputation:
The ranges of Unicode characters which are routinely used for Chinese and Japanese text are:
As a regular expression, this would be expressed as:
/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/
This does not include every character which will appear in Chinese and Japanese text, but any significant piece of typical Chinese or Japanese text will be mostly made up of characters from these ranges.
Note that this regular expression will also match on Korean text that contains hanja. This is an unavoidable result of Han unification.
Upvotes: 33