pivotal-jbarrett
pivotal-jbarrett

Reputation: 66

Is it expected to see Direct NIO ByteBuffer outperforming Direct Netty ByteBuf when writing bytes?

I wrote a JMH benchmark to diagnose so odd throughput issues I was seeing when utilizing Netty buffers in place of NIO buffers. The Netty direct ByteBuf is significantly slower when writing byte by byte than that of its NIO ByteBuffer counterpart. Even more interesting is that if I get an NIO ByteBuffer from the Netty ByteBuf the performance is on par with NIO ByteBuffer. So I can be certain it isn't the underlying direct memory or the internal ByteBuffer but something in the layers of ByteBuf. Is this expected? Am I using it wrong?

Here are the raw results.

Benchmark                                               Mode  Cnt       Score   Error  Units
ByteBufferBenchmark.directByteBuffer                   thrpt    2  206815.012          ops/s
ByteBufferBenchmark.heapByteBuffer                     thrpt    2  159197.697          ops/s
ByteBufferBenchmark.pooledDirectByteBuf                thrpt    2  120753.217          ops/s
ByteBufferBenchmark.pooledDirectByteBufAsByteBuffer    thrpt    2  204986.976          ops/s
ByteBufferBenchmark.pooledHeapByteBuf                  thrpt    2  121846.543          ops/s
ByteBufferBenchmark.pooledHeapByteBufAsByteBuffer      thrpt    2  159503.425          ops/s
ByteBufferBenchmark.unpooledDirectByteBuf              thrpt    2  121781.355          ops/s
ByteBufferBenchmark.unpooledDirectByteBufAsByteBuffer  thrpt    2  208623.215          ops/s
ByteBufferBenchmark.unpooledHeapByteBuf                thrpt    2  158904.532          ops/s
ByteBufferBenchmark.unpooledHeapByteBufAsByteBuffer    thrpt    2  160171.685          ops/s


directByteBuffer = ByteBuffer.allocateDirect
heapByteBuffer = ByteBuffer.allocate
*DirectByteBuf = ByteBufAllocator.allocateDirect
*HeapByteBuf = ByteBufAllocator.allocateHeap
pool* = PooledByteBufAllocator.DEFAULT
unpool* = PooledByteBufAllocator.DEFAULT
*AsByteBuffere = ByteBuffer.nioBuffer

Upvotes: 1

Views: 521

Answers (1)

Francesco Nigro
Francesco Nigro

Reputation: 191

Please share the whole code (and the jdk version) if possible or I cannot understand in which operations the Netty buffers seems to be slower.

As a general rule of thumb: JDK classes can (and many of them very likely will) benefit from being "good citizens" with the JVM itself ie their operations are intrinsified (see the general concept on https://en.m.wikipedia.org/wiki/Intrinsic_function) and by consequence their optimized code inlined (see http://normanmaurer.me/blog/2014/05/15/Inline-all-the-Things/). In short, NIO ByteBuffer play "dirty" against Netty ByteBuf, depending to the version of the JVM, benefitting with several optimizations just not accessible to regular user defined data types. Returning to your question: yes can be expected, depending on the operations and the calling context (which influence inlining of ByteBuf operations, chances of vectorization and bound checks elimination). I've recently fixed an issue on Netty related this on https://github.com/netty/netty/pull/10368 : feel free to dive in the long list of comments, I am sure will help to answer your question.

Upvotes: 2

Related Questions