How Akka benefits from ForkJoinPool?

Question

Akka docs states that default dispatcher is a fork-join-executor because it "gives excellent performance in most cases".
I'm wondering why is it?

From ForkJoinPool

A ForkJoinPool differs from other kinds of ExecutorService mainly by virtue of employing work-stealing: all threads in the pool attempt to find and execute tasks submitted to the pool and/or created by other active tasks (eventually blocking waiting for work if none exist). This enables (1) efficient processing when most tasks spawn other subtasks (as do most ForkJoinTasks), as well as (2) when many small tasks are submitted to the pool from external clients. Especially when setting asyncMode to true in constructors, ForkJoinPools may also be (3) appropriate for use with event-style tasks that are never joined.

At first, I guess that Akka is not an example of case (1) because I can't figure it out how Akka could be forking tasks, I mean, what would be the task that could be forked in many tasks?
I see each message as an independent task, that is why I think Akka is similar to case (2), where the messages are many small tasks being submitted (via ! and ?) to the ForkJoinPool.

The next question, although not strictly related to akka, will be, why a use case where fork and join (main capabilities of ForkJoinPool that allows work-stealing) are not being used still can be benefited by ForkJoinPool?
From Scalability of Fork Join Pool

We noticed that the number of context switches was abnormal, above 70000 per second.
That must be the problem, but what is causing it? Viktor came up with the qualified guess that it must be the task queue of the thread pool executor, since that is shared and the locks in the LinkedBlockingQueue could potentially generate the context switches when there is contention.

However, if it is true that Akka doesn't use ForkJoinTasks, all tasks submitted by external clients will be queued in the shared queue, so the contention should be the same as in ThreadPoolExecutor.

So, my questions are:

Akka uses ForkJoinTasks (case (1)) or is related to case (2)?
Why ForkJoinPool is beneficial in case (2) if all that tasks submitted by external clients will be pushed to a shared queue and no work-stealing will happen?
What would be an example of "with event-style tasks that are never joined" (case 3)?

Update

Correct answer is the one from johanandren, however I want to add some highlights.

Akka doesn't use fork and join capabilities since AFAIK with the Actor model, or at least how we implement it, there isn't really a usecase for that (from johanandren's comment).
So my understanding that Akka is not an instance of case (1) was correct.
In my original answer I said that all tasks submitted by external clients will be queued in the shared queue.
This was correct but only for a previous version (jdk7) of the FJP. In jdk8 the single submission queue was replaced by many "submission queues". This answer explains this well:

Now, before (IIRC) JDK 7u12, ForkJoinPool had a single global submission queue. When worker threads ran out of local tasks, as well the tasks to steal, they got there and tried to see if external work is available. In this design, there is no advantage against a regular, say, ThreadPoolExecutor backed by ArrayBlockingQueue. [...]
Now, the external submission goes into one of the submission queues. Then, workers that have no work to munch on, can first look into the submission queue associated with a particular worker, and then wander around looking into the submission queues of others. One can call that "work stealing" too.

So, this enabled work stealing in scenarios where fork join weren't used. As Doug Lea says

Substantially better throughput when lots of clients submit lots of tasks. (I've measured up to 60X speedups on micro-benchmarks). The idea is to treat external submitters in a similar way as workers -- using randomized queuing and stealing. (This required a big internal refactoring to disassociate work queues and workers.) This also greatly improves throughput when all tasks are async and submitted to the pool rather than forked, which becomes a reasonable way to structure actor frameworks, as well as many plain services that you might otherwise use ThreadPoolExecutor for.

There is another singularity that is worth mention it about FJP taken from this comment

4% is indeed not much for FJP. There's still a trade-off you do with FJP which you need to be aware of: FJP keeps threads spinning for a while to be able to handle just-in-time arriving work faster. This ensures good latency in many cases. Especially if your pool is overprovisioned, however, the trade-off is a bit of latency against more power consumption in almost-idle situations.

johanandren · Accepted Answer

The FJP in Akka is run with asyncMode = true so for the first question that is - having external clients submitting short/small async workloads. Each submitted workload is either dispatching an actor to process one or a few messages from its inbox but it is also used to execute Scala Future operations.

When a non-ForkJoinTask is scheduled to run on the FJP, it is adapted to a FJP and enqueued just like ForkJoinTasks. There's isn't a single submission where tasks are queued (there was in an early version, JDK7 perhaps), there are many, to avoid contention, and an idle thread can pick (steal) tasks from other queues than its own if that is empty.

Note that by default we are currently running on a forked version of the Java 8 FJP, as we saw significant decrease in throughput with the Java 9 FJP when that came (it contains quite a bit of changes). Here's the issue #21910 discussing that if you are interested. Additionally, if you want to play around with benchmarking different pools you can find a few *Pool benchmarks here: https://github.com/akka/akka/tree/master/akka-bench-jmh/src/main/scala/akka/actor

How Akka benefits from ForkJoinPool?

Update

Answers (2)

Related Questions