MmM ...
MmM ...

Reputation: 131

True double byte encoding

Exist some real double byte encoding (DBCS)?

The same question for 4 bytes encoding, exists any(not UCS-4, UTF-32)?

Thanks.

Upvotes: 0

Views: 787

Answers (2)

John Bollinger
John Bollinger

Reputation: 180201

There are certainly legacy character sets that use exactly two bytes for every character, but these generally do not encode ASCII characters at all, being intended to supplement a single-byte character set rather than replacing it. All of those that I am aware of exist to support Chinese, Japanese, and/or Korean ideograph characters.

There are plenty of legacy documents around that use such encodings, and I would not be surprised to find that in some places they are still used in new documents.

If you are trying to determine whether your software can ignore the existence of multi-byte character encodings other than the UTFs, then I'm afraid you won't come away with an easy answer. Of course your software can do so, in the same sense that it can ignore single-byte encodings other than ISO-8859-15, but only you can determine whether your program will adequately serve its purpose if it does so.

Upvotes: 2

Nayuki
Nayuki

Reputation: 18533

No, there are no double-byte character sets that satisfy your list of requirements. This is because designers back in the day used 7-bit ASCII as their starting point (good for compatibility), then put extra characters or multi-byte start codes in the upper half of the 256 byte values.

Similarly for quad-byte character sets, no serious standard before Unicode even tried to provision for more than 65536 characters.

To give one example, Chinese Big5 uses ASCII definitions for bytes 0x00 to 0x7F, uses 0x81 to 0xFF as a start byte for extended characters, and {0x40 to 0x7E, 0xA1 to 0xFE} for the second byte. This can code a maximum of 20067 different characters.

Upvotes: 1

Related Questions