asaini007
asaini007

Reputation: 836

Why does writeBytes discard each character's high eight bits?

I wanted to use DataOutputStream#writeBytes, but was running into errors. Description of writeBytes(String) from the Java Documentation:

Writes out the string to the underlying output stream as a sequence of bytes. Each character in the string is written out, in sequence, by discarding its high eight bits.

I think the problem I'm running into is due to the part about "discarding its high eight bits". What does that mean, and why does it work that way?

Upvotes: 6

Views: 595

Answers (2)

Sage
Sage

Reputation: 15418

The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive). But The byte data type is an 8-bit signed two's complement integer. It has a minimum value of -128 and a maximum value of 127 (inclusive). That is why this function is writing the low-order byte of each char in the string from first to last. Any information in the high-order byte is lost. In other words, it assumes the string contains only characters whose value is between 0and 255.

You may look into the writeUTF(String s) method, which, retains the information in the high-order byte as well as the length of the string. First it writes the number of characters in the string onto the underlying output stream as a 2-byte unsigned int between 0 and 65,535. Next it encodes the string in UTF-8 and writes the bytes of the encoded string to the underlying output stream. This allows a data input stream reading those bytes to completely reconstruct the string.

Upvotes: 5

Most Western programmers tend to think in terms of ASCII, where one character equals one byte, but Java Strings are 16-bit Unicode. writeBytes just writes out the lower byte, which for ASCII/ISO-8859-1 is the "character" in the C sense.

Upvotes: 7

Related Questions