user4344
user4344

Reputation: 681

Widechar to Bytes using bits pattern?

If the number of bytes in UTF-8 encoded wide char is known, would it be possible get bytes using the following method?

For example:

Wide character ¿ code 191 to bytes gives -62 and -65

I've tried to fit the 8 bits in 191 into the slots but didn't get the same result

110[0][0][0][1][0]   10[1][1][1][1][1][1]

      127                   255

Upvotes: 0

Views: 125

Answers (1)

Jon
Jon

Reputation: 3065

First, don't convert to signed bytes. That just confuses matters. So code point 191 yields the byte sequence 194 191

Decimal: 194                   191
Binary:  110[0][0][0][1][0]    10[1][1][1][1][1][1]

To generate these bytes, you start from the right edge. You get six bits from the 191 and two more from the 194, with an additional three bits leftover, yielding:

Binary:  00000[0][0][0]    [1][0][1][1][1][1][1][1]
Decimal: 0                 191

Wikipedia has a surprisingly good writeup on how this all works.

Upvotes: 1

Related Questions