Reputation: 145
I'm writing a Java library that needs to compute SHA-1 hashes. During a common task, the JVM spends about 70% of its time in sun.security.provider.SHA.implCompress
, 10% in java.util.zip.Inflater.inflate
, and 2% in sun.security.provider.ByteArrayAccess.b2iBig64
. (According to NetBeans profiler.)
I can't seem to get the Google search keywords right to get relevant results. I'm not very familiar with the SHA-1 hash algorithm. How can I get the most performance out of an SHA-1 MessageDigest
? Is there a certain chunk size I should be digesting, or multiples of certain sizes I should try?
To answer some questions you're thinking about asking:
MessageDigest.update
), so bytes are only digested once.Upvotes: 6
Views: 3465
Reputation: 34563
SHA-1 has a block size of 64 bytes, so multiples of that are probably best; otherwise the implementation will need to copy partial blocks into buffers.
Are you running on a multi-core computer? You could run the zlib decompression and SHA-1 hashing in separate threads, using something like java.util.concurrent.SynchronousQueue
to hand off each decompressed 64-byte block from the one thread to the other. That way you can have one core hashing one block while another core is decompressing the next block.
(You could try one of the other BlockingQueue
implementations that has some storage capacity, but I don't think it'd help much. The decompression is much faster than the hashing, so the zlib thread would quickly fill up the queue and then it'd have to wait to put each new block, just like with the SynchronousQueue
.)
I know you said you've optimized I/O already, but are you using asynchronous I/O? For maximum performance you don't want to hash one block and then ask the OS to read the next block, you want to ask the OS to read the next block and then hash the one you already have while the disk is busy fetching the next one. However, the OS probably does some readahead already, so this may not make a big difference.
But beyond all that, a cryptographic hash function is a complex thing; it's just going to take time to run. Maybe you need a faster computer. :-)
Upvotes: 1
Reputation: 171178
Maybe you can call out to native code written in C. There must be a ton of super optimized SHA1 libraries available.
Upvotes: 2
Reputation: 2115
Have you tried switching the file processing to a Memory Mapped file? Performance for those tends to be significantly faster than regular IO and NIO.
Upvotes: 0