Fire Frost
Fire Frost

Reputation: 135

Print multi-octet character in c

I'm trying to print some special characters in console but I have a problem :

If I want to print '│' for example, I will get '^B' in the console.

the decimal value of '│' is 9474 and I realised that this character is defined on 3 octet.

If I just do a printf("%c",9474), I'll get '^B' again

One way I thought of solving that problem is to convert 9474 in bytes to then print each octets to have my │ but I don't have a clue on how to do that.

Upvotes: 0

Views: 1264

Answers (2)

%c expects the corresponding argument to be an int that is converted to unsigned char. 9474 is 0x2502 in hex - conversion to unsigned char on platforms with #define CHAR_BIT 8 keep just the least-significant byte, here 0x02 which is the "Start of text", which is echoed on terminals as Control-B, aka ^B.

If your locale environment has been set properly, at the beginning of the program, minimally set LC_CTYPE to the system locale; then print the character using %lc; for maximal compatibility, cast the character to wint_t:

#include <stdio.h>
#include <stdio.h>
#include <errno.h>
#include <wchar.h>
#include <locale.h>

int main(void) {
    if (! setlocale(LC_CTYPE, "")) {
        perror("Unable to set the locale");
        exit(1)
    }
    printf("%lc\n", (wint_t)9474);
}

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148975

Oups, multibyte character processing in C is not that easy and requires... bytes analysis. You character is the unicode character U+2502 BOX DRAWINGS LIGHT VERTICAL, because 9474 is 0x2502.

When you do printf("%c",9474), you print the low order byte of the int value 0x2502, so it is the same as printf("%c",2) which explains why you get a Ctrl B representation as ^B.

As your initial character has a code > 256, it cannot fit in a char, so you need to store it in a wchar_t (it is < 65736 so it will fit in a wchar_t). You can the simply print it as a single wide character:

printf("%lc", 9474);

and you should get the correct , provided your locale is coherent with your terminal charset.

Upvotes: 3

Related Questions