iSankha007
iSankha007

Reputation: 395

Binary to UTF-8 in C

I am working on an application in C where I need to show Unicode UTF-8 characters. I am getting the values as a binary byte stream as 11010000 10100100 as character array which is the Unicode character "Ф".

I want to store and display the character. I tried to convert the binary to a hexadecimal character array. But printing with

void binaryToHex(char *bData) {
    char hexaDecimal[MAX];
    int temp;
    long int i = 0, j = 0;
    while (bData[i]) {
        bData[i] = bData[i] - 48;

        ++i;
    }

    --i;
    while (i - 2 >= 0) {
        temp = bData[i - 3] * 8 + bData[i - 2] * 4 + bData[i - 1] * 2 + bData[i];
        if (temp > 9)
            hexaDecimal[j++] = temp + 55;
        else
            hexaDecimal[j++] = temp + 48;
        i = i - 4;
    }

    if (i == 1)
        hexaDecimal[j] = bData[i - 1] * 2 + bData[i] + 48;
    else if (i == 0)
        hexaDecimal[j] = bData[i] + 48;
    else
        --j;

    printf("Equivalent hexadecimal value: ");
    char hexVal[MAX];
    // size_t len = j+1;
    int k = 0;;
    while (j >= 0) {
        char *ch = (char*)hexaDecimal[j--];
        if (j % 2 == 0) {
            hexVal[k] = '\\';
            k++;
            hexVal[k] = 'x';
            k++;
        }
        printf("\nkk++Length %d ...J= %d.. ", k, j);
        hexVal[k] = ch;
        k++;
        printf("%c", ch);
    }
    printf("KKKK+=== %d", k);
    hexVal[k] = NULL;

    // printf("\nkk++Length %d",strlen(hexVal));
    printf("\nMM+-+MM %s===\n ..>>>>", hexVal);
}

Only showing the value as \xD0\xA4. I did string manipulation for that. But when writing in the way

 char s[]= "\xD0\xA4";
         OR
 char *s= "\xD0\xA4";
 printf("\n %s",s);

producing the desired result that is printing the character "Ф". How can I get the correct string dynamically? Is there any library for this in C?

The code is from http://www.cquestions.com/2011/07/binary-to-hexadecimal-conversion-in.html.

Is there a way to print it from binary directly or from a HEX value. Or is there an alternative for that?

Upvotes: 1

Views: 3864

Answers (2)

iSankha007
iSankha007

Reputation: 395

At last converting the Unicode binary char array to actual binary codepoint like converting 11010000 10100100 to 10000 100100 and then converting to decimal and then to Unicode solved my problem for now.below is the link I use to convert to UTF8 from decimal.

C++ Windows decimal to UTF-8 Character Conversion

resources I used:

https://www.youtube.com/watch?v=vLBtrd9Ar28

https://web.archive.org/web/20180216185523/http://www.zehnet.de/2005/02/12/unicode-utf-8-tutorial/

Upvotes: 0

John Bollinger
John Bollinger

Reputation: 180398

Escape codes such as \xD0 are interpreted by the compiler when encountered in the value of a character or string literal. The compiler replaces them with the corresponding byte (or byte sequence in some cases). They are not meaningful to C at runtime.

You are therefore not only making it harder on yourself but doing altogether the wrong thing by constructing and printing the text of such escape sequences at runtime. What you get is exactly what you should expect. Just print the literal byte sequence you decode from the program input, without any dress-up.

Upvotes: 4

Related Questions