Marcono1234
Marcono1234

Reputation: 6894

Efficiently append last chars to StringBuilder

Note: This question is about Java >= 9 which introduced "compact strings"


Let's say I am appending an unknown number of strings (or chars) to a StringBuilder and at some point determine that I am appending the last string.

How can this be done efficiently?

Background

If the capacity of the string builder is not large enough it will always increase it to max(oldCap + str.lenght(), oldCap * 2 + 2). So if you are unlucky and the capacity is not enough for the last string, it will unnecessarily double the capcity, e.g.:

StringBuilder sb = new StringBuilder(4000);
sb.append("aaa..."); // 4000 * "a"
// Last string:
sb.append("b"); // Unnecessarily increases capacity from 4000 to 8002
return sb.toString();

StringBuilder offers the methods capacity(), length() and getChars(...), however manually creating a char[] and then creating a string will be inefficient because:

Another option would be to check capacity() and if necessary create a new StringBuilder(sb.length() + str.length()), then append sb and str:

StringBuilder sb = new StringBuilder(4000);
sb.append("aaa..."); // 4000 * "a"

String str = "b";
if (sb.capacity() - sb.length() < str.length()) {
    return new StringBuilder(sb.length() + str.length())
        .append(sb)
        .append(str)
        .toString();
}
else {
    return sb.append(str).toString();
}

The only disadvantage is that if the existing string builder or the new string is non-Latin 1 (2 bytes per char), the newly created string builder has to be "inflated" from 1 byte per char (Latin 1) to 2 bytes per char.

Upvotes: 3

Views: 1715

Answers (1)

Eugene
Eugene

Reputation: 120858

You are describing separate different problems IMO, but neither of them is an "actual" problem.

First, is the fact that StringBuilder allocates too much space - that is rarely (if ever) a problem in practice. Think about any List/Set/Map - they do the same thing, might allocate too much, but when you remove an element, they don't shrink their internal storage. They do have a method for that; but so does StringBuilder:

 trimToSize

Due to "compact strings" the string builder has to convert its bytes to chars.

StringBuilder knows what it is storing via the coder field in AbstractStringBuilder which it extends. With compact Strings, String holds its data in a byte[] now (it has a coder too), thus I don't understand where that conversion from byte[] to char[] is supposed to happen. StringBuilder::toString is defined as:

public String toString() {
    // Create a copy, don't share the array
    return isLatin1() ? StringLatin1.newString(value, 0, count)
                      : StringUTF16.newString(value, 0, count);
}

Notice the isLatin1 check - StringBuilder knows what type of data it has internally; thus no conversion when possible.

I assume that by this:

When calling one of the String constructors the chars have to be compacted to bytes again

you mean:

char [] some = ...
String s = new String(some);

I don't know why you are using again here, but may be I am missing something. Just notice that this conversion from char[] to byte[] indeed has to happen, but it's fairly trivial to do (the last 8 bits have to be empty), and as soon as a single char does not meet the precondition, the entire conversion is bailed out. So you either store all characters in LATIN1, or you don't.

Upvotes: 1

Related Questions