Reputation: 29
I am new to programming and was working on some examples in my C++ textbook. I was able to do most of the examples, but a few problems came up when I tried to do the following: Attempting to display Chinese characters on a program similar to "Hello World!"
For the question, regarding input/output of non-unicode characters such as Simplified Chinese, I would like to offer some information as to what I have attempted so far:
I was running the "Hello world!" program on Code:Blocks using C++ and attempted to replace the text "Hello world" with the Chinese characters "你好". I ran the program, but in the command prompt the output was just gibberish (乱码). So, I searched online for information and found out that I had to change my regional setting to "Simplified, China". I did this, rebooted my computer and ran the program again. This time, the program's output was in non-unicode characters, however, they were the incorrect characters (These: 浣犲ソ锛) and I also believe it to be Japanese as well... Some resources in Chinese on the internet stated it to be the coding for "你好", but I'm not too sure. I just want the text I write behind (std::cout << "---\n";) to display correctly like it would when I was using English. How would I get it to where it will display what I write in Code:Block on the Command Prompt?
Lastly, there was a prompt that popped up stating that the encoding was changed because I used illegal characters...
Upvotes: 0
Views: 1566
Reputation: 5848
Having tried the following:
#include <iostream>
int main()
{
std::cout << "你好" << std::endl;
return 0;
}
I got the output:
你好
Which to me appears to be the same characters (i humbly apologise if i do not see the difference that you do). This makes me think that the problem is in the mismatch of the character-to-byte conversion when saving the file and/or compiling on one hand and the display byte-to-character conversion during the execution.
My correct output was on XUbuntu using g++ 4.8.4. The cpp file was saved with vim, and it looks like this:
00000000: 23 69 6e 63 6c 75 64 65 20 3c 69 6f 73 74 72 65 #include <iostre
00000010: 61 6d 3e 0a 0a 69 6e 74 20 6d 61 69 6e 28 29 0a am>..int main().
00000020: 7b 0a 09 73 74 64 3a 3a 63 6f 75 74 20 3c 3c 20 {..std::cout <<
00000030: 22 e4 bd a0 e5 a5 bd 22 20 3c 3c 20 73 74 64 3a "......" << std:
00000040: 3a 65 6e 64 6c 3b 0a 09 72 65 74 75 72 6e 20 30 :endl;..return 0
00000050: 3b 0a 7d 0a -- -- -- -- -- -- -- -- -- -- -- -- ;.}.------------
As you can see each character gets saved as a sequence of 3 bytes of UTF-8 (coding bits in bold):
Since at one time you got 4 characters of text, i believe that somehow these bytes actually get compiled as UTF-8 just fine, but then are read as something else. If they are read as UTF-16, that would attempt to generate 3 characters (2 bytes per character), but it is not a likely scenario, since the standard is created in such a way as to avoid such confusion, and also because you actually got 4 characters, and it's impossible for UTF-16 to use less han 2 bytes to generate a character.
At this point i must say that i do not have enough information to try to help you further. Please consider providing the exact code that you are trying to compile, and if possible a hexadecimal representation of it as well.
Upvotes: 1