cloudygoose
cloudygoose

Reputation: 668

How can C read chinese from console and file

I'm using ubuntu 12.04
I want to know how can I read Chinese using C

  setlocale(LC_ALL, "zh_CN.UTF-8");
  scanf("%s", st1);
  for (b = 0; b < max_w;b++)
  {
    printf("%d ", st1[b]);
    if (st1[b] == 0)
        break;
  }

For this code, when I input English, it outputs fine, but if I enter Chinese like"的",it outputs

Enter word or sentence (EXIT to break): 的
target char seq :
-25 -102 -124 0

I'm wondering why there is negative values in the array.
Further, I found that the bytes of a "的" in file read using fscanf is different from reading from the console.

Upvotes: 0

Views: 153

Answers (2)

Olaf Dietsche
Olaf Dietsche

Reputation: 74028

UTF-8 encodes characters with a variable number of bytes. This is why you see three bytes for the 的 sign.

At graphemica - 的, you can see that 的 has the value U+7684 which translates to E7 9A 84 when you encode it in UTF-8.

You print every byte separately as an integer value. A char type might be signed and when it is converted to an integer, you can get negative numbers too. In your case this is

  • -25 = E7
  • -102 = 9A
  • -124 = 84

You can print the bytes as hex values with %x or as an unsigned integer %u, then you will see positive numbers only.

You can also change your print statement to

printf("%d ", (unsigned char) st1[b]);

which will interpret the bytes as unsigned values and show your output as

231 154 132 0

Upvotes: 3

R.. GitHub STOP HELPING ICE
R.. GitHub STOP HELPING ICE

Reputation: 215221

There's no need (and in fact it's harmful) to hard-code a specific locale name. What characters you can read are independent of the locale's language (used for messages), and any locale with UTF-8 encoding should work fine.

The easiest (but ugly once you try to go too far with it) way to make this work is to use the wide character stdio functions (e.g. getwc) instead of the byte-oriented ones. Otherwise you can read bytes then process them with mbrtowc.

Upvotes: 0

Related Questions