Reputation: 10702
Java is Big-Endian; Network stack is Big-Endian; Intel/AMD (basically all of our computers) and ARM CPU's (most common Android and iOS chips) are all little-endian as well.
Given all of that, if I am allocating a direct ByteBuffer for different uses, is it a good idea to always try and match the endian-ness of the native interaction?
More specifically:
and so on...
I am asking this because I have never thought about the Endian-ness of my ByteBuffers, but after seeing some other questions on SO and the performance impact it can have, it seems worth it or at least something I should become more aware of when utilizing ByteBuffers.
Or maybe there is a down-side here to worrying about endian-ness I am missing and would like to be aware of?
Upvotes: 4
Views: 2706
Reputation: 533560
It points out in the article you quote that the difference is pretty small. (Possibly none)
The results quoted don't show a consistent improvement and using the latest JVMs may close the gap.
mmap: 1.358 bytebuffer: 0.922 regular i/o: 1.387
mmap: 1.336 bytebuffer: 1.62 regular i/o: 1.467
There is a measured difference but its small in the overall scheme of things. If you want it to be much faster, the only option I have found which makes a big difference is using Unsafe directly, in which case only native ordering is available.
Even then it only helps in the most latency sensitive applications.
Unsafe is even more interesting with code and comments. ;)
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/sun/misc/Unsafe.java
Upvotes: 2
Reputation: 21086
If you are using your ByteBuffer to read and store BYTES then byte order doesn't matter at all just use the default.
If you are reading and writing non-byte primitive types (short, int, float, long, double) then the underlying processor will have to do extra work if the CPUs Endianness (ByteOrder.nativeOrder()) is different from the default Java Big Endianness. If you read the other SO questions you linked to you could deduce why... The processor will have to flip the bytes to do any work with the related primitive types. This byte flipping (swapping) operation will use up some CPU cycles.
Quick example using two Short values 1 and 2. Assuming your CPU is an x86 processor.
short A = 1;
short B = 2;
short C = A + B;
If your native processor expects little endian
MOV ax, short[A] ; ax register [ 01, 00 ]
MOV bx, short[B] ; bx register [ 02, 00 ]
ADD ax, bx ; ax register [ 03, 00 ]
MOV short[C], ax ; C [ 03, 00 ]
And you give it big endian, it has to do extra work.
MOV ax, short[A] ; ax register [ 00, 01 ]
MOV bx, short[B] ; bx register [ 00, 02 ]
BSWAP ax ; ax register [ 01, 00 ]
BSWAP bx ; bx register [ 02, 00 ]
ADD ax, bx ; ax register [ 03, 00 ]
MOV short[C], ax ; C [ 03, 00 ]
So, on the very lowest level it matters, but unless you are noticing/profiling a major bottleneck in your code just use the default.
Upvotes: 2