Reputation: 80
I'm trying to deserialize Strings from files directly and I have a question about very long Strings: Java Strings have a character count limit equal to Integer.MAX_VALUE
, which is 31^2-1.
But here comes my question: what happens when I have a UTF-8 String with little less than that size but formed by characters with size more than 1 byte and then I ask Java to give me the byte array?
To make it clearer, what happens if I could run this code? (I haven't got RAM enough):
String toPrint = "";
String string100 = "";
int max = Integer.MAX_VALUE -100;
for (int i = 0; i < 100; i += 10) {
string100 += "1234567ñ90";
}
for (int i = 0; i < max; i += 100) {
toPrint += string100;
}
System.out.println("String complete!");
byte[] byteArray = toPrint.getBytes(StandardCharsets.UTF_8);
System.out.println(byteArray.length);
System.exit(0);
Does it print "String complete!"? Or does it break before?
Upvotes: 2
Views: 2529
Reputation: 2490
Fundamentally, the limit on Strings
is that the char
arrays inside of them can't be longer than the maximum array length, which is roughly Integer.MAX_VALUE
and greater than your variable max
. Strings store their characters in UTF-16 and therefore the UTF-16 representation of a string can't exceed the maximum array length. The number of bytes in UTF-8 and the number of logical characters (Unicode code points, or UTF-32 characters) ultimately don't matter.
Now let's move to your particular example. Since each of the 10 characters in "1234567ñ90" is a single UTF-16 value, that string takes up 10 values of a String
's char
array. Despite your code's horrible performance and high memory requirement, it should eventually get to "String complete!" if there is sufficient available memory. However, it will break when converting to UTF-8 because the UTF-8 representation of the string is longer than the maximum array length, since "ñ" requires more than one byte.
Upvotes: 3
Reputation: 73568
Array size is also limited to Integer.MAX_VALUE
(which is why String
size is limited, after all there's a char[]
backing it) , so it's impossible to get the byte array if the encoding uses more bytes than that, no matter what the size of the String
is in characters.
The end result would be an OutOfMemoryError
, but creating the String
in the first place would succeed.
Upvotes: 0