Reputation: 21
I'm encountering an issue with my MATLAB R2019b GUI project. I'm trying to display two sets of 4DCT (Four-Dimensional Computed Tomography) images simultaneously. Each set contains 10 CT volumes. While displaying the first case(two sets), I want to asynchronously load additional sets in the background. My approach involves using parfeval(@select_data, x, data{idx})
for each 4DCT, where select_data is a function utilizing parfor
to read the 10 DICOM volumes.
When I run select_data
independently, it performs well, utilizing all available workers efficiently (e.g., 12 workers). However, when I use parfeval
, only one additional worker is utilized, leading to slower loading times (about 30-40 seconds per 4DCT). Also it causes lag in the main thread, even though I expect multiple workers to be employed for the loading process and the rest available for my main.
Here are the server specs I have access to:
CPU(s): 24
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
Memory: 70GB
I've attempted to limit the number of workers to 8 in the parpool, leaving 4 cores free to handle the main GUI thread, but the lag persists.
It's possible that the issue lies with the usage of parfor
within parfeval
, which might not be leveraging all available workers as expected?
Edit: Using parfor within a parfeval is not possible, parfor will be processed as a normal for-loop if called by parfeval.
Upvotes: 2
Views: 291
Reputation: 25160
You certainly can use parfor
in the functions invoked using parfeval
. It is difficult to do this in a non-trivial manner in R2019b though. In later releases of MATLAB (>=R2020a), you can use a thread pool inside each local process worker. It's awkward to set up though, and even then there might be challenges getting everything you need running on thread-pool workers.
The approach I would explore is converting your select_data
so that you can run it directly in chunks inside parfeval
. I.e. instead of something like this
function out = select_data(in)
parfor i = 1:n
out(i) = dostuff(i);
end
end
You somehow work out how many iterates you need and then effectively call the guts of the parfor
loop directly from the client via parfeval
, like this:
for i = 1:n
fut(i) = parfeval(@dostuff, 1, i);
end
fetchOutputs(fut);
This also would let you use e.g. afterAll
on the array of futures to do something else asynchronously once the work is all finished.
Upvotes: 4
Reputation: 18187
I expect nesting parallel structures to not yield any additional speed-up. You can assign a number of workers to the outer parallellisation, which doesn't leave any room to use even more workers from outside the current, already parallel, outer level. In any case, it's recommended to parallelise the outermost loop in case of nested loops, I expect the same to hold true for nesting other parallellisation structures.
In case your outer structure only contains two elements (i.e. your idx
is 2
), you might be better off by serially evaluating that loop and using parallellisation on the reading, since you've already verified that that does indeed use more workers and thus reduces execution time.
There's a bunch of background links on how and when to leverage parallellisation in this answer of mine as well as this one.
There's an example in the parfeval
docs that updates a UI, that might be worth a try. Alternatively, you could try spmd()
to parallellise your reading and assign a limited number of workers to each SPMD. Given you have two sockets with 6 workers each, I'd set the number of workers to 6, leaving a single socket for the main thread:
parpool(6)
spmd
your_code()
end
Upvotes: 2