Tu ..
Tu ..

Reputation: 29

How convert wchar_t to unicode number?

How convert wchar value into number in unicode table?

I have a variable:

wchar_t znak;
znak=getwchar();

I type 'ą' how convert znak to #261 I need number in unicode table.

ą U+0105 LATIN SMALL LETTER A WITH OGONEK

UTF-16: 0x0105

XML: & # 261;

Upvotes: 1

Views: 985

Answers (1)

a3f
a3f

Reputation: 8657

The standard didn't specify sizeof(wchar_t) (or its encoding), so you should have stated what system you are on.

Assuming *nix (Linux, BSD, OSX etc.)

wchar_t is 32 bits and stores UTF-32 code points, which is a fixed-length encoding. You could use znak directly with no conversion needed.

Although you should first check whether UTF-8 and char aren't better up to your task (For conversion, UTF-32 is certainly better, but your program might do more than that).

If you determine that UTF-8 is an overall better choice for your program, you can use mbstowcs to get a UTF-32 code point out of your UTF-8 code point.

Assuming Windows

wchar_t is 16 bits and stores UTF-16LE code units. For Console I/O you are limited to UCS-2 though. The difference lies in that UTF-16 is not a fixed length encoding. So-called Surrogate pairs (albeit rare) allow the representation of non-BMP code points.

So in your case, just using using znak directly will work too.

For completion sake's though, here is a possible implementation from the UTF-16 Wikipedia article:

u32 read_code_point_from_utf16()
{
  u16 code_unit = getu16();
  if (code_unit >= 0xD800 && code_unit <= 0xDBFF) {
    u16 code_unit_2 = getu16();
    if (code_unit_2 >= 0xDC00 && code_unit_2 <= 0xDFFF)
       return (code_unit << 10) + code_unit_2 - 0x35FDC00;
    push_back(code_unit_2);
  }
return code_unit;
}

Finally, use sprintf(s, "&#%d;", znak) and sprintf(s, "0x%x", znak) to get it into the required base.

Upvotes: 5

Related Questions