Falk Tandetzky
Falk Tandetzky

Reputation: 6600

Which implementations of hash algorithms like SHA-256 for Java are thread-safe?

I need to compute SHA-256 hashes in a Java Application. Apaches HmacUtils seem appropriate for this. However, in the documentation it says "This class is immutable and thread-safe. However the Mac may not be". Here "Mac" is referring to the instance returned, which actually calculates the hash.

Now, I want to be able to compute hashes from multiple threads, ideally without them blocking each other. What is the best way to achieve this? Is there a reference which algorithm would actually be thread safe? I understand that it would also need to be reentrant? Should I create one instance of "Mac" per thread to avoid thread safety issues? (Is this sufficient? Is this considerably more expensive?)

Upvotes: -3

Views: 431

Answers (2)

bk2204
bk2204

Reputation: 76409

Most hash algorithms that we use today operate by processing chunks of data (usually either 512 bits or 1024 bits) in a series. This is true for SHA-2, SHA-3, and non-parallel variants of BLAKE2, which are the most common secure hash algorithms in used today. The same applies to HMAC constructions or built-in MAC constructions based on these algorithms.

As a result, you will typically want to process an instance of these algorithms on a single thread anyway because the algorithms themselves are not parallelizable. Using the typical, non-pure approach to implement hash algorithms (which is the approach almost every cryptographic library, including this one, uses), that means that a MAC or hash instance is mutable and must either be used on a single thread or synchronized across threads. As I mentioned, usually the former is a better idea.

The cost of creating a hash or MAC instance is usually very small, so that is practically not a concern. However, if you're performing many MACs with the same key, it can be somewhat more efficient to create a single instance with the key initialized and then clone it, which saves the cost of hashing the first block for the key. This might be a significant performance improvement if the message is small.

There are algorithms, such as BLAKE3 and MD6, that use tree hashing and these can often benefit from parallelization. Usually they will have documentation telling you how threads are created and handled (often with a thread pool) and the thread-safety requirements involved. These algorithms are often threaded internally and so instances may be safe to share across threads if the documentation states so. However, the library you're using doesn't offer any of those, so it's not a concern you have to deal with.

If you're planning on processing small chunks of data, you'll almost certainly want to use a thread pool, since the overhead of spawning a thread can be non-trivial compared to hashing.

Upvotes: 2

rzwitserloot
rzwitserloot

Reputation: 102775

There are a few ways to set this stuff up:

You create a thread for each job you have.

In other words, you make a new java.lang.Thread instance (or an instance of a class that extends Thread), which, if you run it, does one job and then ends.

This is a bad idea - at least, if speed is important and you have a ton of jobs to do. The actual cost of making those threads depends on tons of factors, truly a combinatory explosion: JVM version, vendor, underlying OS, underlying architecture. It may well be a fine way to do it, speed-wise, but there are combos where this is far slower than the alternative, which is to have a limited set of threads (roughly equal to the amount of cores in your CPU, maybe about 2x as much, but, linearly related to core count), and these threads pull jobs off of a central queue of em until all jobs are done.

There are various frameworks that make this 'nice' (such as fork/join, ExecutorPool, etc), and if you somehow want to handroll all this, you can use collection types in java.util.concurrent for the 'central queue of all jobs to be done that all threads pull from'.

Point is, worrying about the costs involved in creating a new instance of a Mac object for each job (which is, effectively, what you're going to have to end up doing in order to avoid any issues with thread safety) is kinda silly if you haven't addressed the elephant in the room. In other words, if this describes your current set up, fix it first, so that you end up:

You have jobs, and a pool of executors that pull jobs off of it.

... and the amount of executors are limited (same order of magnitude as the number of CPU cores in the machine).

Then, relative to the job count, the amount of threads you have is constant, i.e. ignorable.

The solution is then simply to make 1 Mac object for each thread. That means you have 1 mac object per CPU which is trivial, ignorable cost. There is no point attempting to use fewer mac objects; you gain no measurable performance even if this would have been fine (i.e. mac objects have no context at all, which.. is obviously not true), or even lose performance if somehow these mac objects are thread safe but do have state (because the JVM is very good at eliminating sync costs if they are irrelevant because the object is never used from multiple threads even if it would have worked out if you did).

One tool to accomplish this simply is to have a ThreadLocal:

private static final ThreadLocal<Mac> MACS = ThreadLocal.withInitial(
  () -> HmacUtils.getInitializedMac("algorithm", key));


public void codeThatIsRunInManyThreads() {
  Mac mac = MACS.get();
  mac.reset();
  // use mac here. Rest safely knowing it's all safe now.
}

That code ensures each thread will create a new mac object, but, will do so once - that created mac object is reused for each job. As many Mac objects are created as threads end up doing at least one job. 32 CPU cores and 10,000 jobs? About 64 mac objects will be created.

Each mac object is isolated to a single thread. If it isn't thread safe, no problem. If it is, the synchronized or other thread safety mechanisms it employs incur little cost as the JVM will be able to determine fairly easily they don't do anything and will eliminate them.

Once the threads do die (which they will once all jobs are done), the created mac objects will get garbage collected same as any garbage (so, they won't, until a long time passes or memory is needed - like all garbage).

Upvotes: 1

Related Questions