Frank
Frank

Reputation: 2173

Check whether a string contains Japanese/Chinese characters

I need a way to check whether a string contains Japanese or Chinese text.

Currently I'm using this:

string.match(/[\u3400-\u9FBF]/);

but it does not work with this for example: ディアボリックラヴァーズ or バッテリー.

Could you help me with that?

Thanks

Upvotes: 17

Views: 28259

Answers (4)

maxxx
maxxx

Reputation: 726

This may help if you need to differentiate between CKJ languages:

function detectCJKLanguage(str) {
  // Chinese characters range
  const chineseRegex = /[\u4E00-\u9FFF]/;
  // Japanese characters range
  const japaneseRegex = /[\u3040-\u30FF\u31F0-\u31FF\uFF00-\uFFEF]/;
  // Korean characters range
  const koreanRegex = /[\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF]/;

  // Check if the string contains any CJK characters
  if (chineseRegex.test(str)) {
    return { isCJK: true, language: "Chinese" };
  } else if (japaneseRegex.test(str)) {
    return { isCJK: true, language: "Japanese" };
  } else if (koreanRegex.test(str)) {
    return { isCJK: true, language: "Korean" };
  } else {
    return { isCJK: false, language: "Non-CJK" };
  }
}

Upvotes: 0

wpmarts
wpmarts

Reputation: 542

You can use this code and it's works for me.

let str = "渣打銀行提供一系列迎合你生活需要嘅信用卡";
//let str = "SGGRAND DING HOUSE 4GRAND DING HOUSE";
const REGEX_CHINESE = /[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/;
const hasChinese = str.match(REGEX_CHINESE);
if(hasChinese){
  alert("Found");
}
else{
  alert("Not Found");
}

Upvotes: 4

daviddna
daviddna

Reputation: 163

swift 4, changed the pattern to and NSRegularExpression for replace, maybe might help someone!

[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]

extension method

mutating func removeRegexMatches(pattern: String, replaceWith: String = "") {
        do {
            let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options.caseInsensitive)
            let range = NSMakeRange(0, self.count)
            self = regex.stringByReplacingMatches(in: self, options: [], range: range, withTemplate: replaceWith)
        } catch {
            return
        }
    }

    mutating func removeEastAsianChars() {
        let regexPatternEastAsianCharacters = "[\u{3040}-\u{30ff}\u{3400}-\u{4dbf}\u{4e00}-\u{9fff}\u{f900}-\u{faff}\u{ff66}-\u{ff9f}]"
        removeRegexMatches(pattern: regexPatternEastAsianCharacters)
    }

example, string result is ABC

"ABC検診センター".removeEastAsianChars()

Upvotes: 4

user149341
user149341

Reputation:

The ranges of Unicode characters which are routinely used for Chinese and Japanese text are:

  • U+3040 - U+30FF: hiragana and katakana (Japanese only)
  • U+3400 - U+4DBF: CJK unified ideographs extension A (Chinese, Japanese, and Korean)
  • U+4E00 - U+9FFF: CJK unified ideographs (Chinese, Japanese, and Korean)
  • U+F900 - U+FAFF: CJK compatibility ideographs (Chinese, Japanese, and Korean)
  • U+FF66 - U+FF9F: half-width katakana (Japanese only)

As a regular expression, this would be expressed as:

/[\u3040-\u30ff\u3400-\u4dbf\u4e00-\u9fff\uf900-\ufaff\uff66-\uff9f]/

This does not include every character which will appear in Chinese and Japanese text, but any significant piece of typical Chinese or Japanese text will be mostly made up of characters from these ranges.

Note that this regular expression will also match on Korean text that contains hanja. This is an unavoidable result of Han unification.

Upvotes: 33

Related Questions