Jan Turoň
Jan Turoň

Reputation: 32912

Why UTF-32 uses four bytes?

If UTF-32 is UCS-4 restricted to 17 planes (1114111 char points) which requires 21 bits, what is the fourth byte doing?

Upvotes: 1

Views: 285

Answers (1)

rici
rici

Reputation: 241721

The fourth byte is just sitting there, occupying space (which is filled with 0s).

In theory, a 21-bit or 24-bit interchange format could have been designed. In practice, those are both quite awkward. Few (if any) modern computers have 21- or 24-bit datatypes. Since 32-bit words are easy to work with, it is quite common to use them to store numeric datatypes whose maxima are considerably less than 231-1.

Upvotes: 2

Related Questions