Reputation: 17648
I was surprised to find that the following code
System.out.println("Character size:"+Character.SIZE/8);
System.out.println("String size:"+"a".getBytes().length);
outputs this:
Character size:2
String size:1
I would assume that a single character string should take up the same (or more ) bytes than a single char.
In particular I am wondering.
If I have a java bean with several fields in it, how its size will increase depending on the nature of the fields (Character, String, Boolean, Vector, etc...) I'm assuming that all java objects have some (probably minimal) footprint, and that one of the smallest of these footprints would be a single character. To test that basic assumption I started with the above code - and the results of the print statements seem counterintuitive.
Any insights into the way java stores/serializes characters vs strings by default would be very helpful.
Upvotes: 7
Views: 22169
Reputation: 4274
getBytes()
outputs the String
with the default encoding (most likely ISO-8859-1
) while the internal character char has always 2 bytes. Internally Java uses always char arrays with a 2 byte char, if you want to know more about encoding, read the link by Oded in the question comments.
Upvotes: 10
Reputation: 23780
I want to add some code first and then a bit of explanation:
import java.nio.charset.Charset;
public class Main {
public static void main(String[] args) {
System.out.println("Character size: " + Character.SIZE / 8);
final byte[] bytes = "a".getBytes(Charset.forName("UTF-16"));
System.out.println("String size: " + bytes.length);
sprintByteAsHex(bytes[0]);
sprintByteAsHex(bytes[1]);
sprintByteAsHex(bytes[2]);
sprintByteAsHex(bytes[3]);
}
static void sprintByteAsHex(byte b) {
System.out.print((Integer.toHexString((b & 0xFF))));
}
}
And the output will be:
Character size: 2
String size: 4
feff061
So what you are actually missing is, you are not providing any parameter to the getBytes method. Probably, you are getting the bytes for UTF-8 representation of the character 'a'.
Well, but why did we get 4 bytes, when we asked for UTF-16? Ok, Java uses UTF-16 internally, then we should have gotten 2 bytes right?
If you examine the output:
feff061
Java actually returned us a BOM: https://en.wikipedia.org/wiki/Byte_order_mark.
So the first 2 bytes: feff is required for signalling that following bytes will be UTF-16 Big Endian. Please see the Wikipedia page for further information.
The remaining 2 bytes: 0061 is the 2 byte representation of the character "a" you have. Can be verified from: http://www.fileformat.info/info/unicode/char/0061/index.htm
So yes, a character in Java is 2 bytes, but when you ask for bytes without a specific encoding, you may not always get 2 bytes since different encodings will require different amount of bytes for various characters.
Upvotes: 0
Reputation: 4693
well, you have that 1 char in char array has the size of 2 bytes and that your String contains is 1 character long, not that it has 1 byte size.
The String
object in Java consists of:
private final char value[];
private final int offset;
private final int count;
private int hash;
only this should assure you that anyway the String
object is bigger then char
array.
If you want to learn more about how object's size you can also read about the object headers and multiplicity factor for char arrays. For example here or here.
Upvotes: 0
Reputation: 10422
I would like to say what i think,correct me if i am wrong but you are finding the length of the string which is correctly it is showing as 1 as you have only 1 character in the string. length shows the length not the size . length and size are two different things.
check this Link.. you are finding the number of bytes occupied in the wrong way
Upvotes: 2
Reputation: 12283
The SIZE of a Character is the storage needed for a char, which is 16 bit. The length of a string (also the length of the underlying char-array or bytes-array) is the number of characters (or bytes), not a size in bit.
That's why you had do to the division by 8 for the size, but not for the length. The length needs to be multiplied by two.
Also note that you will get other lengths for the byte-array if you specify a different encoding. In this case a transformation to a single- or varying-size encoding was performed when doing getBytes().
See: http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#getBytes(java.nio.charset.Charset)
Upvotes: -2