Crackerman
Crackerman

Reputation: 733

Understanding Java Encoding

I am trying to determine if an in-house method will decode a byte array correctly given different encodings. The following code is how I approached generating data to encode.

public class Encoding {

  static byte[] VALUES = {(byte) 0x00, ..... (byte) 0xFF};
  static String[] ENCODING = {"Windows-1252","ISO-8859-1"};

  public static void main(String[] args) throws UnsupportedEncodingException {

    for(String encode : ENCODING) {
      for(byte value : VALUES) {
        byte[] inputByte = new byte[]{value};
        String input = new String(inputByte, encode);
        String houseInput = houseMethod(input.getBytes());
      }
    }
  }
}

My question is when it comes making the call to the house method, what encoding will it send to that method? It is my understanding when Java stores a String, it converts it to UTF-16. So when I am sending Input.getBytes(), is it sending the UTF-16 encoding byte or the encoding scheme that I set when I created a new String? I am guessing that it is UTF-16, but I am not sure. Should the house method be???

houseMethod(input.getBytes(encode))

Upvotes: 1

Views: 438

Answers (2)

Farrandu
Farrandu

Reputation: 371

As per Java documentation String.getBytes():

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array

So the bytes that the in house method gets depend on which OS you are, as well as your locale settings.

OTH, String.getBytes(encoding) ensures you get the bytes in the encoding you pass as parameter.

Upvotes: 2

Durandal
Durandal

Reputation: 20059

See String.getBytes():

Encodes this String into a sequence of bytes using the platform's default charset, storing the result into a new byte array.

You are well advised to use the String.getBytes(Charset) method instead and explicitly specify the desired encoding.

Upvotes: 4

Related Questions