Seninha
Seninha

Reputation: 329

Assign Unicode character to a char

I want to do the following assignment:

char complete = '█', blank='░';

But I got the following warning (I'm using the latest version of gcc):

trabalho3.c: In function ‘entrar’:
trabalho3.c:243:9: warning: multi-character character constant [-Wmultichar]
   char complete = '█', blank='░';
                    ^
trabalho3.c:243:3: warning: overflow in implicit constant conversion [-Woverflow]
   char complete = '█', blank='░';
                    ^
trabalho3.c:244:23: warning: multi-character character constant [-Wmultichar]
   char complete = '█', blank='░';
                             ^
trabalho3.c:244:17: warning: overflow in implicit constant conversion [-Woverflow]
   char complete = '█', blank='░';
                             ^

How can I do this assignment?

Upvotes: 1

Views: 938

Answers (2)

Davislor
Davislor

Reputation: 15164

You can store those characters as:

  • a UTF-8 string, const unsigned char complete[] = u8"█";
  • a wide character defined in <wchar.h>, const wchar_t complete = L'█';
  • a UTF-32 character defined in <uchar.h>, const char32_t complete = U'█';
  • a UTF-16 character, although this is generally a bad idea.

Use UTF-8 when you can, something else when you have to. The 32-bit type is the only one that guarantees fixed width. There are functions in the standard library to read and write wide-character strings, and in many locales, you can read and write UTF-8 strings just like ASCII once you call setlocale() or convert them to wide characters with mbstowcs().

Upvotes: 0

Jonathan Leffler
Jonathan Leffler

Reputation: 755026

When I copy those lines from the posting and echo the result through a hex dump program, the output is:

0x0000: 63 68 61 72 20 63 6F 6D 70 6C 65 74 65 20 3D 20   char complete = 
0x0010: 27 E2 96 88 27 2C 20 62 6C 61 6E 6B 3D 27 E2 96   '...', blank='..
0x0020: 91 27 3B 0A                                       .';.
0x0024:

And when I run it through a UTF-8 decoder, the two block characters are identified as:

0xE2 0x96 0x88 = U+2588 (FULL BLOCK)
0xE2 0x96 0x91 = U+2591 (LIGHT SHADE)

And if the characters are indeed 3-bytes long, trying to store all three bytes into a single character is going to cause problems.

You need to validate these observations; there is a lot of potential for the data being filtered between your system and mine. However, the chances are that if you take a look at the source code using similar tools, you will find that the characters are either UTF-8 or UFT-16 encoded, and neither of these will fit into a single byte. If you think they are characters in a single-byte code set (CP-1252 or something similar, perhaps), you should show the hex dump for the line of code containing the initializations, and identify the platform and code set you're working with.

Upvotes: 2

Related Questions