Reputation: 29
How convert wchar value into number in unicode table?
I have a variable:
wchar_t znak;
znak=getwchar();
I type 'ą' how convert znak to #261 I need number in unicode table.
ą U+0105 LATIN SMALL LETTER A WITH OGONEK
UTF-16: 0x0105
XML: & # 261;
Upvotes: 1
Views: 985
Reputation: 8657
The standard didn't specify sizeof(wchar_t)
(or its encoding), so you should have stated what system you are on.
wchar_t
is 32 bits and stores UTF-32 code points, which is a fixed-length encoding. You could use znak
directly with no conversion needed.
Although you should first check whether UTF-8 and char
aren't better up to your task (For conversion, UTF-32 is certainly better, but your program might do more than that).
If you determine that UTF-8 is an overall better choice for your program, you can use mbstowcs
to get a UTF-32 code point out of your UTF-8 code point.
wchar_t
is 16 bits and stores UTF-16LE code units. For Console I/O you are limited to UCS-2 though. The difference lies in that UTF-16 is not a fixed length encoding. So-called Surrogate pairs (albeit rare) allow the representation of non-BMP code points.
So in your case, just using using znak
directly will work too.
For completion sake's though, here is a possible implementation from the UTF-16 Wikipedia article:
u32 read_code_point_from_utf16()
{
u16 code_unit = getu16();
if (code_unit >= 0xD800 && code_unit <= 0xDBFF) {
u16 code_unit_2 = getu16();
if (code_unit_2 >= 0xDC00 && code_unit_2 <= 0xDFFF)
return (code_unit << 10) + code_unit_2 - 0x35FDC00;
push_back(code_unit_2);
}
return code_unit;
}
Finally, use sprintf(s, "&#%d;", znak)
and sprintf(s, "0x%x", znak)
to get it into the required base.
Upvotes: 5