Reputation: 4000
we are currently trying to tweak performance by using multithreading in our java app. We have a long running serial task which we would like to split to multi CPU cores.
Basically we have list with let's say 100.000 items / things to do.
My question now is is it better to do:
Option 1 (Pseudocode):
for(i = 0; i < 100000; i++){
threadpool.submit(new MyCallable("1 thing to do"))
}
This would add 100000 runnables/callables to the queue of the threadpool (current LinkedBlockingQueue)
or is it better to do: Option 2 (Pseudocode)
for(i = 0; i < 4; i++){
threadpool.submit(new MyCallable("25000 things to do"))
}
We have tried already option 1, and we didn't notice any performance improvement, although we can clearly see that multiple threads are working like crazy and also 4 CPU cores used. But my feeling is that there is some overhead in option 1 because of the many tasks. We haven't tried option 2 yet, but my feeling is, that it could speed up things as there is less overhead. We are basically splitting the list into 4 larger chunks instead 100000 single items.
Any thoughts on this?
Thanks
Upvotes: 4
Views: 1696
Reputation: 6499
The problem with 4 batches can be that if one of them finishes in 10 minutes, and three of them are for 20 minutes, than 1 core will not be used for 10 minutes while 3 other threads will process items at 3 cores. But you are right about overhead. But the only way to verify is to check it, because a lot of things depends on your data.
Upvotes: 1
Reputation: 533442
You have N cores in your machine. You want to make use of all your cores but with a minimum of overhead. Thus the minimum number of tasks is likely to N if the tasks are equal in size. If they are not equal having M*N tasks can be better as it can mean all cores are equally busy even though some tasks are relatively short. e.g. one core does one long task while another does three short ones. I use an M of 2-4 for most of my use cases.
If you can, you can sort the longer running tasks to be started first to get the best balance. i.e. sort the tasks from longest to shortest before adding them.
e.g. if you have 8 cores, you may find that 8 tasks is optimal for CPU bound processing. For IO bound processing or tasks which takes differeing amound of time 2*8 to 4*8 tasks may be optimal.
Upvotes: 1
Reputation: 95316
What matters is that you minimize the amount of context switching, and maximize the amount of work per task that it spends computing. As a practical matter, if your tasks are computing, exceeding the number of physical CPUs isn't going to help. If your tasks actually do a lot of I/O and I/O waits, you want to have many of them so there is always a bunch of "ready" tasks available when one blocks.
If you really have 25000 things to do, and the things are computation, I'd probably set up 32 threads (more CPUs than you have, but not a lot of extra overhead) and parcel out 10-50 units of work to each one if those units are relatively small.
Upvotes: 3
Reputation: 9117
Well, it depends on your use-case.
Performance wise, I think having larger chunks of work is better than smaller many threads. Context switching would be lesser, and thus, you would be able to save your CPU cycles and RAM.
When the number of tasks are smaller, this might not matter much, but yeah, if you have 10000 threads, it does matter.
Upvotes: 1
Reputation: 17182
Your analysis is correct: there will be less cost (memory, context switching, and general instruction count) in batching the items - at least, generally speaking.
This becomes less and less relevant as the individual tasks get larger, though - if you already spend 99 percent of your time doing work, rather than threadpool overhead or object creation, you can only optimise the 1 percent left this way.
Upvotes: 3