Reputation: 20746
I heard that I don't have to place BOM at the start of UTF-8 file / stream.
Does it have a fixed bytes order then?
What about UTF-16 and UTF-32 in this case?
Upvotes: 0
Views: 87
Reputation: 32202
UTF-8 does not need a byte order since it is defined in terms of a stream of bytes. The order is given directly by the address of the individual byte. A varying number of bytes makes up a single codepoint.
UTF-32 on the other hand is defined in terms of a stream of 32bit units (i.e. 4 bytes each, each mapping directly to a Unicode codepoint) which can be encoded in different ways into a stream of bytes.
That is what the BOM indicates for you, basically whether the bytes are ordered with their significance (i.e. the earliest byte in the stream is the least significant, little endian) or against it (i.e. the earliest byte is the most significant, big endian).
UTF-16 is similar but a bit funkier. It's defined as a stream of 16bit units, so you have to worry about the byte order. Additionally, since a single 16bit unit is not (anymore) enough to encode all of Unicode, it is also a multi-"unit" encoding, thus combining that shortcomings of UTF-8 and UTF-32 :)
Upvotes: 1