Reputation: 6894
Note: This question is about Java >= 9 which introduced "compact strings"
Let's say I am appending an unknown number of strings (or chars) to a StringBuilder
and at some point determine that I am appending the last string.
How can this be done efficiently?
If the capacity of the string builder is not large enough it will always increase it to max(oldCap + str.lenght(), oldCap * 2 + 2)
. So if you are unlucky and the capacity is not enough for the last string, it will unnecessarily double the capcity, e.g.:
StringBuilder sb = new StringBuilder(4000);
sb.append("aaa..."); // 4000 * "a"
// Last string:
sb.append("b"); // Unnecessarily increases capacity from 4000 to 8002
return sb.toString();
StringBuilder
offers the methods capacity()
, length()
and getChars(...)
, however manually creating a char[]
and then creating a string will be inefficient because:
String
constructors the chars have to be compacted to bytes againAnother option would be to check capacity()
and if necessary create a new StringBuilder(sb.length() + str.length())
, then append sb
and str
:
StringBuilder sb = new StringBuilder(4000);
sb.append("aaa..."); // 4000 * "a"
String str = "b";
if (sb.capacity() - sb.length() < str.length()) {
return new StringBuilder(sb.length() + str.length())
.append(sb)
.append(str)
.toString();
}
else {
return sb.append(str).toString();
}
The only disadvantage is that if the existing string builder or the new string is non-Latin 1 (2 bytes per char), the newly created string builder has to be "inflated" from 1 byte per char (Latin 1) to 2 bytes per char.
Upvotes: 3
Views: 1715
Reputation: 120858
You are describing separate different problems IMO, but neither of them is an "actual" problem.
First, is the fact that StringBuilder
allocates too much space - that is rarely (if ever) a problem in practice. Think about any List/Set/Map
- they do the same thing, might allocate too much, but when you remove an element, they don't shrink their internal storage. They do have a method for that; but so does StringBuilder
:
trimToSize
Due to "compact strings" the string builder has to convert its bytes to chars.
StringBuilder
knows what it is storing via the coder
field in AbstractStringBuilder
which it extends. With compact Strings, String
holds its data in a byte[]
now (it has a coder
too), thus I don't understand where that conversion from byte[]
to char[]
is supposed to happen. StringBuilder::toString
is defined as:
public String toString() {
// Create a copy, don't share the array
return isLatin1() ? StringLatin1.newString(value, 0, count)
: StringUTF16.newString(value, 0, count);
}
Notice the isLatin1
check - StringBuilder
knows what type of data it has internally; thus no conversion when possible.
I assume that by this:
When calling one of the String constructors the chars have to be compacted to bytes again
you mean:
char [] some = ...
String s = new String(some);
I don't know why you are using again here, but may be I am missing something. Just notice that this conversion from char[]
to byte[]
indeed has to happen, but it's fairly trivial to do (the last 8 bits have to be empty), and as soon as a single char
does not meet the precondition, the entire conversion is bailed out. So you either store all characters in LATIN1
, or you don't.
Upvotes: 1