Ran Deloun
Ran Deloun

Reputation: 507

Detect Chinese character in java

Using Java how to detect if a String contains Chinese characters?

    String chineseStr = "已下架" ;

if (isChineseString(chineseStr)) {
  System.out.println("The string contains Chinese characters");
}else{
  System.out.println("The string contains Chinese characters");
}

Can you please help me to solve the problem?

Upvotes: 20

Views: 30973

Answers (3)

ccpizza
ccpizza

Reputation: 31801

A more direct approach:

if ("粽子".matches("[\\u4E00-\\u9FA5]+")) {
    System.out.println("is Chinese");
}

If you also need to catch rarely used and exotic characters then you'll need to add all the ranges: What's the complete range for Chinese characters in Unicode?

Upvotes: 4

Joop Eggen
Joop Eggen

Reputation: 109613

Now Character.isIdeographic(int codepoint) would tell wether the codepoint is a CJKV (Chinese, Japanese, Korean and Vietnamese) ideograph.

Nearer is using Character.UnicodeScript.HAN.

So:

System.out.println(containsHanScript("xxx已下架xxx"));

public static boolean containsHanScript(String s) {
    for (int i = 0; i < s.length(); ) {
        int codepoint = s.codePointAt(i);
        i += Character.charCount(codepoint);
        if (Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN) {
            return true;
        }
    }
    return false;
}

Or in java 8:

public static boolean containsHanScript(String s) {
    return s.codePoints().anyMatch(
            codepoint ->
            Character.UnicodeScript.of(codepoint) == Character.UnicodeScript.HAN);
}

Upvotes: 50

Ruchira Gayan Ranaweera
Ruchira Gayan Ranaweera

Reputation: 35577

You can try with Google API or Language Detection API

Language Detection API contains simple demo. You can try it first.

Upvotes: 0

Related Questions