Pratanu Mandal
Pratanu Mandal

Reputation: 647

C read and write unsigned char (0 - 255) as UTF-8

I am trying to read and write unsigned char (0 - 255) extended ASCII characters (unicode) from and to console under windows (cross platform compatibility is needed) in C.

Under extended ASCII (unicode), code-point 255 is ÿ and code-point 220 is Ü.

Right now I have the following code for writing and reading.

#include<stdio.h>
#include<locale.h>

int main() {
    setlocale(LC_ALL, "");

    unsigned char ch = 255;
    wprintf(L"Character %d = %lc\n", ch, ch);

    wprintf(L"Enter a character: ");
    wscanf(L"%lc", &ch);
    wprintf(L"Character %d = %lc\n", ch, ch);

    return 0;
}

The output is:

Character 255 = ÿ
Enter a character: ÿ
Character 220 = Ü

As evident, code-point 255 is displayed properly as ÿ. However, when taking ÿ as input, it is being read as code-point 220. Consequently, when code-point 220 is printed, it is displayed as Ü.

Thus, the writing is working fine. However, while reading, when the ASCII characters are above 127 (128 - 255), the read code-point is 36 less than the actual value.

Can you please help me understand what I am doing wrong and how I can fix this.

Upvotes: 1

Views: 716

Answers (1)

Schwern
Schwern

Reputation: 164769

%lc takes a wide character wchar_t, wide refers to it being multi-byte, but the exact size is implementation specific. Giving it a 1 byte unsigned char will cause odd behavior as it will read a byte or two extra.

But if you're using 1 byte characters you don't need to use wprintf nor wscanf. Just use printf and scanf.

And, as noted by others, "extended ASCII" is not "Unicode". See this question for more.

Upvotes: 1

Related Questions