Adam Hughes
Adam Hughes

Reputation: 16319

Why is disruptor slower with smaller ring buffer?

Following the Disruptor Getting Started Guide, I've built a minimal disruptor with a single producer and single consumer.

Producer

import com.lmax.disruptor.RingBuffer;

public class LongEventProducer
{
    private final RingBuffer<LongEvent> ringBuffer;

    public LongEventProducer(RingBuffer<LongEvent> ringBuffer)
    {
        this.ringBuffer = ringBuffer;
    }

    public void onData()
    {
        long sequence = ringBuffer.next();
        try
        {
            LongEvent event = ringBuffer.get(sequence);
        }
        finally
        {
            ringBuffer.publish(sequence);
        }
    }
}

Consumer (Notice the consumer does nothing onEvent)

import com.lmax.disruptor.EventHandler;

public class LongEventHandler implements EventHandler<LongEvent>
{
    public void onEvent(LongEvent event, long sequence, boolean endOfBatch)
    {}
}

My goal was to performance test going around a large ring buffer once versus traversing a smaller ring multiple times. In each case the total ops (bufferSize X rotations) is the same. I found was that the ops/sec rate fell drastically as the ring buffer got smaller.

RingBuffer Size |  Revolutions  | Total Ops   |   Mops/sec

    1048576     |      1        |  1048576    |     50-60

       1024     |      1024     |  1048576    |     8-16

        64      |      16384    |  1048576    |    0.5-0.7

        8       |      131072   |  1048576    |    0.12-0.14

Question: what is the reason for the massive degradation of performance when the ring buffer size is reduced but the total iterations is fixed? This trend is independent of the WaitStrategy and Single vs MultiProducer- the throughput is reduced, but the trend is the same.

Main (notice SingleProducer and BusySpinWaitStrategy)

import com.lmax.disruptor.BusySpinWaitStrategy;
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.RingBuffer;
import com.lmax.disruptor.dsl.ProducerType;

import java.util.concurrent.Executor;
import java.util.concurrent.Executors;

public class LongEventMainJava{
        static double ONEMILLION = 1000000.0;
        static double ONEBILLION = 1000000000.0;

    public static void main(String[] args) throws Exception {
            // Executor that will be used to construct new threads for consumers
            Executor executor = Executors.newCachedThreadPool();    

            // TUNABLE PARAMS
            int ringBufferSize = 1048576; // 1024, 64, 8
            int rotations = 1; // 1024, 16384, 131702

            // Construct the Disruptor
            Disruptor disruptor = new Disruptor<>(new LongEventFactory(), ringBufferSize, executor, ProducerType.SINGLE, new BusySpinWaitStrategy());

            // Connect the handler
            disruptor.handleEventsWith(new LongEventHandler());

            // Start the Disruptor, starts all threads running
            disruptor.start();

            // Get the ring buffer from the Disruptor to be used for publishing.
            RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();
            LongEventProducer producer = new LongEventProducer(ringBuffer);

            long start = System.nanoTime();
            long totalIterations = rotations * ringBufferSize;
            for (long i = 0; i < totalIterations; i++) {
                producer.onData();
            }
            double duration = (System.nanoTime()-start)/ONEBILLION;
            System.out.println(String.format("Buffersize: %s, rotations: %s, total iterations = %s, duration: %.2f seconds, rate: %.2f Mops/s",
                    ringBufferSize, rotations, totalIterations, duration, totalIterations/(ONEMILLION * duration)));
        }
}

And to run, you'll need the trivial Factory code

import com.lmax.disruptor.EventFactory;

public class LongEventFactory implements EventFactory<LongEvent>
{
    public LongEvent newInstance()
    {
        return new LongEvent();
    }
}

Running on core i5-2400, 12GB ram, windows 7

Sample Output

Buffersize: 1048576, rotations: 1, total iterations = 1048576, duration: 0.02 seconds, rate: 59.03 Mops/s

Buffersize: 64, rotations: 16384, total iterations = 1048576, duration: 2.01 seconds, rate: 0.52 Mops/s

Upvotes: 3

Views: 3209

Answers (2)

Sotirios Delimanolis
Sotirios Delimanolis

Reputation: 280141

When the producer(s) fills up the ring buffer, it has to wait until events are consumed before being able to proceed.

When your buffer is exactly the size of the number of elements you will put in, the producer never has to wait. It'll never overflow. All it's doing is essentially incrementing a count, the index, and publishing the data in the ring buffer at that index.

When your buffer is smaller, it's still just incrementing a count and publishing, but it's doing it faster than the consumer can consume. The producer therefore has to wait until elements are consumed and space on the ring buffer is freed up.

Upvotes: 6

Adam Hughes
Adam Hughes

Reputation: 16319

Seems like the issue lies in this block of code in lmax\disruptor\SingleProducerSequencer

if (wrapPoint > cachedGatingSequence || cachedGatingSequence > nextValue)
        {
            cursor.setVolatile(nextValue);  // StoreLoad fence

            long minSequence;
            while (wrapPoint > (minSequence = Util.getMinimumSequence(gatingSequences, nextValue)))
            {
                waitStrategy.signalAllWhenBlocking();
                LockSupport.parkNanos(1L); // TODO: Use waitStrategy to spin?
            }

            this.cachedValue = minSequence;
        }

Particularly the call to LockSupport.parkNanos(1L). This can take up to 15ms on Windows. When the producer gets to the end of the buffer and is waiting on the consumer, this gets called.

Secondly, when the buffer is small, the false sharing of the RingBuffer is likely occurring. I surmise both of these effect are in play.

Finally, I was able to speed up the code using JIT with one million calls to onData() before benchmarking. This got the best case up > 80Mops/sec, but did not remove the degradation with buffer shrinkage.

Upvotes: 0

Related Questions