Reputation: 21
I recently came across this question for an assessment:
ExecutorService threadpool = Executors.newFixedThreadPool(N);
for(Runnable task : tasks){
threadpool.submit(task);
}
Each task spends 25% for the computation and 75% doing I/O. assume we are working on a quad core machine(no hyper threading), what should be the size of thread pool N to achieve maximum performance without wasting threads? (assume we have infinite I/O capacity)
I guessed 16 as the machine has infinite I/O it means, that we can concentrate fully on CPUs. Each task is using one quarter of a CPU while running. That means, we can have four tasks running to saturate one CPU core and that makes N=16 on a quad core machine.
update: the options for this questions were 2,4,5,6,7,8,12,and 16.
Upvotes: 2
Views: 3192
Reputation: 971
Although this question has no strictly right or wrong answer, the subjectively good one would be:
32 Threads
You have to think in terms of probability. For now let's just consider one CPU core and independent threads:
One thread has a 25% chance to be doing a computation at any given time. If you have 2 independent threads (probability events), the probability of having at least one doing some CPU work is not 50% but 7/16 (43.75%). (If you are not sure about that, you should refresh some of those probability skills).
You probably see where this is going. For the P to be 100%, the thread count would have to be infinite. So we have to make an educated guess: 4 threads have a P of ~68%, 8 threads ~90%. To go up in count would be now really unproductive, so we settle at 8. That is for one core. We have 4 CPU cores, so we can multiply that by 4 and we get the final answer: 32.
Upvotes: 2
Reputation: 59144
You are correct that you should be thinking about saturating your cores. The best answer will be more than 16, though. If you have only 16 threads, then the CPU demands aren't going to align perfectly so that all your cores are in use all the time.
So the best answer is > 16, but also small enough not to significantly increase individual task completion time, impose significant thread-switching costs, or waste a whole lot of memory.
If you learned this in class, then your prof probably gave you multiplier to use as a "rule of thumb". He would be expecting you to remember it and apply it here.
I usually use average_demand = 2*num_cores, so would pick 32 threads. This works well in most cases. When the average CPU demand is twice the number of cores, the core utilization will be pretty close to 100%.
Also, in this case, the CPU portion of each task only gets 1/2 core on average, so it takes twice as long... but it's only 25% of the work so the task completion time is only 13% more than optimal.
The 2-times default that I use is almost always higher than the optimal number, but its also almost always low enough not to impose significant extra overheads. If you know that your tasks are very heavily CPU-bound, then you can confidently reduce this number.
If you really want to find the optimal value, then you can measure it, but when your in the right range it's not gonna make a lot of difference.
--
P.S NOTE: The 'average_demand' I used above is the expected number of cores that would be in use at any time given N threads and N cores.
Upvotes: 3