Why is there no Unicode starting with 0xC1?

Question

While studying the Unicode and utf-8 encoding,

I noticed that the 129th Unicode encoded by the utf-8 starts with 0xc2.

I checked the last letter of 0xcf.

No Unicode was 0xc1 encoded as 0xc1.

Why 129th unicode is start at 0xc2 instead of 0xc1?

gnasher729 · Accepted Answer

UTF-8 starting with 0xc1 would be a Unicode code point in the range 0x40 to 0x7f. 0xc0 would be a Unicode code point in the range 0x00 to 0x3f.

There is an iron rule that every code point is represented in UTF-8 in the shortest possible way. Since all these code points can be stored in a single UTF-8 byte, they are not allowed to be stored using two bytes.

For the same reason you will find that there are no 4-byte codes starting with 0xf0 0x80 to 0xf0 0x8f because they are stored using fewer bytes instead.

Why is there no Unicode starting with 0xC1?

Answers (2)

Related Questions