Philip Paek
Philip Paek

Reputation: 281

Java StringBuilder(StringBuffer)'s ensureCapacity(): Why is it doubled and incremented by 2?

I have searched about this, but I couldn't find why StringBuilder's ensureCapacity() method won't lengthen the old capacity by just doubling but also adding two.

So, when default capacity of 16 is full, next lengthened value will be 34 unless whole string length doesn't exceed 34. Why shouldn't it be 32?

My best guess is considering for a null character, '\u0000', but I'm not sure. Can anyone tell me why?

Upvotes: 27

Views: 1717

Answers (5)

Abhishek
Abhishek

Reputation: 157

 public void ensureCapacity(int minimumCapacity) {
     if (minimumCapacity > value.length) {
         expandCapacity(minimumCapacity);
     }
 }



 void expandCapacity(int minimumCapacity) {
     int newCapacity = (value.length + 1) * 2;
     if (newCapacity < 0) {
         newCapacity = Integer.MAX_VALUE;
     } else if (minimumCapacity > newCapacity) {
         newCapacity = minimumCapacity;
     }
    value = Arrays.copyOf(value, newCapacity);
}

NOTE: value.length is the capacity of the StringBuffer, not the length.

It has nothing to do with a null string because minimum capacity is 16.

What I think is, the memory allocations cost so much time, and if we are calling ensureCapacity() frequently with increasing minimumCapacity , (capacity +1)*2 will allocate a bit more memory and may reduce further allocations and save some time.

lets consider initial capacity as 16,

only with doubling 16 , 32 , 64 , 128 , 256 , 512 , 1024 , 2048 , so on...

with double +2 16 , 34 , 70 , 142 , 286 , 574 , 1150 , 2302 , so on...

Thus the memory will gradually keeping increasing every time and may decrease the no of allocations of memory.

Upvotes: 0

Honza Zidek
Honza Zidek

Reputation: 20296

I downloaded the original source code of Java 1.5 from the Oracle web and it contains the following lines:

/**
 * This implements the expansion semantics of ensureCapacity with no
 * size check or synchronization.
 */
void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2;
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }   
    char newValue[] = new char[newCapacity];
    System.arraycopy(value, 0, newValue, 0, count);
    value = newValue;
} 

So at least two things are clear:

  • the theory that other corrections were additionally added is false (e.g. "the odd (double + 2) semantics made more sense when it was the only line in the function" is not true)
  • most probably it was originally meant as "let's make room for at least one more character and let's multiply it by two"

Upvotes: 0

Gregory.K
Gregory.K

Reputation: 1337

I suppose that object alignment is a key, because length * 2 + 2-strategy is memory-effective (see explanation below).

Let's consider HotSpot JVM.

First of all, java objects are 8-bytes aligned and char array is not an exception.

Secondly, sizeof(object header) equals 8 bytes on 32-bit JVM and 16 bytes on 64-bit JVM with -XX:-UseCompressedOops.

Thus, object body should be aligned by 8 bytes:
objectBodySize(charArray) == sizeOf(arrayLength) + sizeOf(arrayValues) == (4 bytes) + (arrayLength * 2 bytes).

If old array length is even, then new array length will always give zero-size alignment.

Examples:

  1. oldCharArrayLength == 6 then newCharArrayLength == 14 and objectBodySize(newCharArray) == 4 + 14 * 2 == 32

  2. oldCharArrayLength == 4 then newCharArrayLength == 10 and objectBodySize(newCharArray) == 4 + 10 * 2 == 24

It's important to note that -XX:+UseCompressedOops flag is available since 1.6 whereas StringBuilder and AbstractStringBuilder are available since 1.5. It means that the strategy above with two additional chars has zero-cost of memory on 64-bit JVM before 1.6, whereas sizeof(object header) == 12 bytes when run on 64-bit JVM with -XX:+UseCompressedOops.

Upvotes: 0

Ralf Kleberhoff
Ralf Kleberhoff

Reputation: 7290

I think the reason is a combination of

  • some ancient ;-) heuristic strategy how to extend the capacity, especially for short buffers,

  • documenting this strategy in earliest java API docs,

  • Sun/Oracle being very careful to stick to once-documented behaviour.

StringBuilder shares this method with its predecessor StringBuffer, which reads (probably from the earliest beginnings, at least in j2sdk1.4_02, which happened to still exist in some archive folder on my machine):

/**
 * Ensures that the capacity of the buffer is at least equal to the
 * specified minimum.
 * If the current capacity of this string buffer is less than the 
 * argument, then a new internal buffer is allocated with greater 
 * capacity. The new capacity is the larger of: 
 * <ul>
 * <li>The <code>minimumCapacity</code> argument. 
 * <li>Twice the old capacity, plus <code>2</code>. 
 * </ul>
 * If the <code>minimumCapacity</code> argument is nonpositive, this
 * method takes no action and simply returns.
 *
 * @param   minimumCapacity   the minimum desired capacity.
 */
public synchronized void ensureCapacity(int minimumCapacity) {
    if (minimumCapacity > value.length) {
        expandCapacity(minimumCapacity);
    }
}

And it documents exactly the times-two plus-two behaviour, so even if in the meantime some JRE developer had found a better strategy, there's no chance to implement it here because it wouldn't conform to the documentation.

Upvotes: 0

Edwin Buck
Edwin Buck

Reputation: 70949

I believe it has to do with a simple, if somewhat dumb, way to ensure the corner case of very small strings.

For example, if I have the string

""

and I double it only, I will not have a sufficient size to store anything else in it. If I double it and add a small constant number of spaces, I can assure that my new value is larger than my old one.

Why increment it by two then? Probably a small performance improvement. By adding two instead of 1, I can avoid an intermediate expansion for small expansions (0 to 10 chars detailed below)

"" => expand => "1" => expand => "123" expand => "1234567" expand => "123456789012345"

which is 4 expands compared to

"" => expand => "12" => expand => "123456" => expand => "123456789012"

which is 3 expands. This also works nicely for one char strings (expanding to 10 chars)

"1" => expand => "1234" => expand => "1234567890"

while the 1 char expansion routine looks like

"1" => expand => "123" => expand => "1234567" => expand => "123456789012345"

Finally, an added increment of two tends to word align about 50% of the time, while added increments of one or three would do so about 25% of the time. While this might not seem like a big deal, some architectures cannot accommodate non-aligned reads without expensive interrupt calls to rewrite the read in the CPU, leading to all sorts of performance issues.

Upvotes: 8

Related Questions