In the "95% of performance is about clean representative models" talk by Martin Thompson , between 17 and 21 minute, such code is presented: public class Queue { private final Object[] buffer; private final int capacity; // Rest of the code } In 20:16 he says: You can get much better performance, so leaving things like capacity in there is the right thing to do. I tried to come up with a code example in which capacity will be much faster than buffer.length , but I failed. Martin is saying that problems arise in two scenerios: In a concurrent world. But, length field is also final , JLS 10.7 . So, I don't see how this could be a problem. When a cache misses. I tried calling capacity vs buffer.length one million times (with queue having one million of elements), but there was no significant difference. I used JMH for benchmarking. Could you please provide a code example, which demonstrates a case in which capacity is superior to buffer.length in terms of performance? The more common case (frequently spotted in the real code), the better. Please note, that I'm totally taking away the aspect of aesthetics, clean code, potential for code re-factoring etc. I'm asking only about the performance.

Reputation: 19837

Is using own int capacity faster than using .length field of an array?

In the "95% of performance is about clean representative models" talk by Martin Thompson, between 17 and 21 minute, such code is presented:

public class Queue
{
    private final Object[] buffer;
    private final int capacity;

    // Rest of the code

}

In 20:16 he says:

You can get much better performance, so leaving things like capacity in there is the right thing to do.

I tried to come up with a code example in which capacity will be much faster than buffer.length, but I failed.

Martin is saying that problems arise in two scenerios:

In a concurrent world. But, length field is also final, JLS 10.7. So, I don't see how this could be a problem.
When a cache misses. I tried calling capacity vs buffer.length one million times (with queue having one million of elements), but there was no significant difference. I used JMH for benchmarking.

Could you please provide a code example, which demonstrates a case in which capacity is superior to buffer.length in terms of performance?

The more common case (frequently spotted in the real code), the better.

Please note, that I'm totally taking away the aspect of aesthetics, clean code, potential for code re-factoring etc. I'm asking only about the performance.

Upvotes: 11

Answers (4)

Has QUIT--Anony-Mousse

Reputation: 77454

I doubt this will have any positive performance impact. It won't help with eliminating bound checks in Hotspot, for example. Even worse: it may be faster in one JVM, but maybe in the next version it hurts. Java keeps on getting additional optimizations, and array boundary checks are one thing they try hard to optimize...

I believe this may be a leftover from rewriting real Queue code to create a simpler example. Because in a real queue, you will need to take care of the used capacity, and sometimes you want to allow an upper bound on the capacity (to block producers when consumers cannot keep up). If you had such code (with setCapacity/getCapacity, and non-final capacity) and simplify it by removing resize logic and finalizing the backing storage, this is what you may end up with.

Upvotes: 0

apangin

Reputation: 98284

When you access array normally, JVM uses its length anyway to perform bounds check. But when you access array via sun.misc.Unsafe (like Martin does), you don't have to pay this implicit penalty.

Array's length field usually lies in the same cache line as its first elements, so you will have false sharing when multiple threads write into first indices concurrently. Using the separate field for buffer capacity will break this false sharing.

Here is a benchmark that shows how capacity field makes an array access substantially faster:

package bench;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Threads;
import sun.misc.Unsafe;

import java.lang.reflect.Field;
import java.util.concurrent.atomic.AtomicReferenceArray;

@State(Scope.Benchmark)
@Threads(4)
public class Queue {
    private static final Unsafe unsafe = getUnsafe();
    private static final long base = unsafe.arrayBaseOffset(Object[].class);
    private static final int scale = unsafe.arrayIndexScale(Object[].class);

    private AtomicReferenceArray<Object> atomic;
    private Object[] buffer;
    private int capacity;

    @Param({"0", "25"})
    private volatile int index;

    @Setup
    public void setup() {
        capacity = 32;
        buffer = new Object[capacity];
        atomic = new AtomicReferenceArray<>(capacity);
    }

    @Benchmark
    public void atomicArray() {
        atomic.set(index, "payload");
    }

    @Benchmark
    public void unsafeArrayLength() {
        int index = this.index;
        if (index < 0 || index >= buffer.length) {
            throw new ArrayIndexOutOfBoundsException();
        }
        unsafe.putObjectVolatile(buffer, base + index * scale, "payload");
    }

    @Benchmark
    public void unsafeCapacityField() {
        int index = this.index;
        if (index < 0 || index >= capacity) {
            throw new ArrayIndexOutOfBoundsException();
        }
        unsafe.putObjectVolatile(buffer, base + index * scale, "payload");
    }

    private static Unsafe getUnsafe() {
        try {
            Field f = Unsafe.class.getDeclaredField("theUnsafe");
            f.setAccessible(true);
            return (Unsafe) f.get(null);
        } catch (IllegalAccessException | NoSuchFieldException e) {
            throw new AssertionError("Should not happen");
        }
    }
}

Results:

Benchmark                  (index)   Mode  Cnt      Score      Error   Units
Queue.atomicArray                0  thrpt    5  41804,825 ±  928,882  ops/ms
Queue.atomicArray               25  thrpt    5  84713,201 ± 1067,911  ops/ms
Queue.unsafeArrayLength          0  thrpt    5  48656,296 ±  676,166  ops/ms
Queue.unsafeArrayLength         25  thrpt    5  88812,863 ± 1089,380  ops/ms
Queue.unsafeCapacityField        0  thrpt    5  88904,433 ±  360,936  ops/ms
Queue.unsafeCapacityField       25  thrpt    5  88633,490 ± 1426,329  ops/ms

Upvotes: 9

Si.

Reputation: 81

I'm no JVM expert and do not claim to understand its optimisation.

Have you considered looking at the byte code to see what instructions are executed?

public class Queue {

    private final Object[] buffer;
    private final int capacity;

    public Queue(int size) {
        buffer = new Object[size];
        this.capacity =  size;
    }

    public static void main(String... args) {
        Queue q = new Queue(10);
        int c = q.capacity;
        int l = q.buffer.length;
    }
}

This is the disassembled bytecode for the main method above.

public static void main(java.lang.String...);
    descriptor: ([Ljava/lang/String;)V
    flags: ACC_PUBLIC, ACC_STATIC, ACC_VARARGS
    Code:
      stack=3, locals=4, args_size=1
         0: new           #5                  // class Queue
         3: dup
         4: bipush        10
         6: invokespecial #6                  // Method "<init>":(I)V
         9: astore_1

        10: aload_1
        11: getfield      #4                  // Field capacity:I
        14: istore_2

        15: aload_1
        16: getfield      #3                  // Field buffer:[Ljava/lang/Object;
        19: arraylength

        20: istore_3
        21: return

We see that both have the instruction getfield, however the array.length has an additional instruction arraylength

Looking at the jvm spec for arraylength

instructionIsTypeSafe(arraylength, Environment, _Offset, StackFrame,
                      NextStackFrame, ExceptionStackFrame) :- 
    nth1OperandStackIs(1, StackFrame, ArrayType),
    arrayComponentType(ArrayType, _),
    validTypeTransition(Environment, [top], int, StackFrame, NextStackFrame),
    exceptionStackFrame(StackFrame, ExceptionStackFrame).

nth1OperandStackIs - This instruction checks that the incoming is a reference type and references an array. If the array reference is null, throws a NullPointerException

arrayComponentType - Check the type of elements. The component type of an array of X is X

validTypeTransition - Type checking rules

So, calling length on an array has the additional instruction arraylength. Very interested to know more about this question.

Upvotes: 0

leventov

Reputation: 15263

You shouldn't percieve Martin's words directly. When he said "Using array.length is an anti-pattern that is copied over projects", I think it's slyness.

Using the capacity field indeed allows to improve locality, pollutes caches less and helps to avoid false sharing, but it requires to write really horrible source code, that is very far from being "clean and simple", Martin advertised in this talk.

The problem is, even you don't write array.length in your source directly, the JVM anyway access the length (that means, accesses the array header) on each array indexing array[i], to check bounds. Hotspot JVM has issues with eliminating bounds checks even in "simple" looping cases, and I think it is not able to interpret some "external" checks like if (i < capacity) return array[i]; as the bound check, i. e. bind the capacity field and the array size.

That is why, to make capacity-pattern making any sense, you need to access the array only via Unsafe! That, unfortunately, disables many bulk loop optimizations.

Look at Martin's "clean" queue implementation :)

I also could try to explain what was meant under concurrent considerations when accessing "final" array.length. My experiments show that even "read-read" concurrent cache line access introduce some sort of "false sharing" and slowing the things down. (I think JVM engineers considered this when made @sun.misc.Contended to offset 128 bytes from both sides of the contended fields; probably this is to ensure both two-side cache line prefetch and "read-read false sharing" won't affect the performance.)

That is why, when queue consumers and producers access the capacity to wrap around the buffer, they better access different objects, containing the same (by value) capacity field, and a reference to the same array. Accessing this array via unsafe producers and compumers normally access different areas of that array, don't sharing anything falsely.

IMO the antipattern now is to try implementing another Queue, while people behind https://github.com/JCTools/JCTools (including Martin, btw) optimize this to death.

Upvotes: 3

Is using own int capacity faster than using .length field of an array?

Answers (4)

Related Questions