Reputation: 13517
Let's say we have a my_string = "123456"
I do
my_string.getBytes()
and
new BigInteger(123456).toByteArray()
The resulting byte arrays are different for both these cases. Why is that so? Isn't "123456" same as 123456 other than the difference in data type?
Upvotes: 0
Views: 1964
Reputation: 718788
I assume that you are asking about the total memory used to represent a number as a String
versus a byte[]
.
The String
size will depend on the actual string representation used. This depends on the JVM version; see What is the Java's internal represention for String? Modified UTF-8? UTF-16?
For Java 8 and earlier (with some caveats), the String consists of a String
object with 1 int
fields and 1 reference
field. Assuming 64 bit references, that adds up to 8 bytes of header + 1 x 4 bytes + 1 x 8 bytes + 4 bytes of padding. Then add the char[]
used to represent the characters: 12 bytes of header + 2 bytes per character. This needs to be rounded up to a multiple of 8.
For Java 9 and later, the main object has the same size. (There is an extra field ... but that fits into the "padding".) The char[]
is replaced by a byte[]
, and since you are just storing ASCII decimal digits1, they will be encoded one character per byte.
In short, the asymptotic space usage is 1 byte per decimal digit for Java 9 or later and 2 bytes per decimal digit in Java 8 or earlier.
For the byte[]
representation produce from a BigInteger
, the represention consists of 12 bytes of header + 1 byte per byte
... rounded up to a multiple of 8. The asymptotic size is 1 byte per byte
.
In both cases there is also the size of the reference to the representation; i.e. another 8 bytes.
If you do the sums, the byte[]
representation is more compact than the String
representation in all cases. But int
or long
are significantly more compact that either of these representations in all cases.
1 - If you are not ... or if you are curious why I added this caveat ... read the Q&A at the link above!
Upvotes: 1
Reputation: 198033
No. Why would they be? "123456"
is a sequence of the ASCII character 1
(which is not represented as the number 1
, but as the number 49), followed by the number 2 (50), and so on. 123456
as an int
isn't even represented as a sequence of digits from 0-9, but it's stored as a number in binary.
Upvotes: 2
Reputation: 500
They are different because the String
type is made up of unicode characters. The character '2'
is not at all the same as the numeric value 2.
Upvotes: 3