Ashwin
Ashwin

Reputation: 13517

What is the difference in bytes of a number as a string and as an integer?

Let's say we have a my_string = "123456"

I do

my_string.getBytes()

and

new BigInteger(123456).toByteArray()

The resulting byte arrays are different for both these cases. Why is that so? Isn't "123456" same as 123456 other than the difference in data type?

Upvotes: 0

Views: 1964

Answers (3)

Stephen C
Stephen C

Reputation: 718788

I assume that you are asking about the total memory used to represent a number as a String versus a byte[].

The String size will depend on the actual string representation used. This depends on the JVM version; see What is the Java's internal represention for String? Modified UTF-8? UTF-16?

For Java 8 and earlier (with some caveats), the String consists of a String object with 1 int fields and 1 reference field. Assuming 64 bit references, that adds up to 8 bytes of header + 1 x 4 bytes + 1 x 8 bytes + 4 bytes of padding. Then add the char[] used to represent the characters: 12 bytes of header + 2 bytes per character. This needs to be rounded up to a multiple of 8.

For Java 9 and later, the main object has the same size. (There is an extra field ... but that fits into the "padding".) The char[] is replaced by a byte[], and since you are just storing ASCII decimal digits1, they will be encoded one character per byte.

In short, the asymptotic space usage is 1 byte per decimal digit for Java 9 or later and 2 bytes per decimal digit in Java 8 or earlier.

For the byte[] representation produce from a BigInteger, the represention consists of 12 bytes of header + 1 byte per byte ... rounded up to a multiple of 8. The asymptotic size is 1 byte per byte.

In both cases there is also the size of the reference to the representation; i.e. another 8 bytes.

If you do the sums, the byte[] representation is more compact than the String representation in all cases. But int or long are significantly more compact that either of these representations in all cases.


1 - If you are not ... or if you are curious why I added this caveat ... read the Q&A at the link above!

Upvotes: 1

Louis Wasserman
Louis Wasserman

Reputation: 198033

No. Why would they be? "123456" is a sequence of the ASCII character 1 (which is not represented as the number 1, but as the number 49), followed by the number 2 (50), and so on. 123456 as an int isn't even represented as a sequence of digits from 0-9, but it's stored as a number in binary.

Upvotes: 2

Josh
Josh

Reputation: 500

They are different because the String type is made up of unicode characters. The character '2' is not at all the same as the numeric value 2.

Upvotes: 3

Related Questions