rustyx
rustyx

Reputation: 85341

Is ISO-8859-1 encoding binary-safe in Java?

If I read a binary stream into a String using an ISO-8859-1 encoding, and subsequently convert it back to a binary stream, would I always get exactly the same bytes? And if not, when would I not get the same bytes?

public byte[] toStringAndBack(byte[] binaryData) throws Exception {
    String s = new String(binaryData, "ISO-8859-1");
    return s.getBytes("ISO-8859-1");
}

=== EDIT ===

Test:

    byte[] d = {0, 1, 2, 3, 4, (byte)128, (byte)129, (byte)130}; // some not defined values
    byte[] dd = toStringAndBack(d);
    for (byte b : dd)
        System.out.print((b&0xFF) + " ");

Output:

0 1 2 3 4 128 129 130

So, even not defined bytes seem to be converted properly.

Upvotes: 0

Views: 814

Answers (2)

wero
wero

Reputation: 32980

Let's test it:

// all possible bytes
byte[] bin = new byte[256];
for (int i=0; i<bin.length; i++)
    bin[i] = (byte)i;

// convert to string
String s = new String(bin, "ISO-8859-1");
for (int i=0; i<s.length(); i++)
{
    if (s.charAt(i) != i)
        System.out.println(i + " s[i]=" + s.charAt(i));
}

// convert back to byte[]
byte[] bout = s.getBytes("ISO-8859-1");
for (int i=0; i<bin.length; i++)
{
    if (bin[i] != bout[i])
        System.out.println(i + " in=" + bin[i] + " bout=" + bout[i]);
}

System.out.println("done");

It prints only done.

Therefore at least for the current ISO-8859-1 implementation the operations are binary safe as defined in the question.

EDIT:
the current implementation is sun.nio.cs.ISO_8859_1. Looking at the source it only checks if a char is < 256 to decide if it can be encoded.

Upvotes: 1

T.J. Crowder
T.J. Crowder

Reputation: 1074238

The constructor you're using says:

The behavior of this constructor when the given bytes are not valid in the given charset is unspecified.

So in theory it could fail for any value ISO-8859-1 doesn't assign characters to, such as 0-31 and 128-160.

That means even if it works on a given JVM's String implementation (or Charset implementation for ISO-8859-1), you cannot rely on it working on another JVM's String/Charset implementation (whether that's just a different dot-rev of a JVM from the same vendor, or a different vendor's JVM).

Upvotes: 1

Related Questions