max_hid
max_hid

Reputation: 21

Can I use parfor within a parfeval in Matlab R2019b and if yes how?

I'm encountering an issue with my MATLAB R2019b GUI project. I'm trying to display two sets of 4DCT (Four-Dimensional Computed Tomography) images simultaneously. Each set contains 10 CT volumes. While displaying the first case(two sets), I want to asynchronously load additional sets in the background. My approach involves using parfeval(@select_data, x, data{idx}) for each 4DCT, where select_data is a function utilizing parfor to read the 10 DICOM volumes.

When I run select_data independently, it performs well, utilizing all available workers efficiently (e.g., 12 workers). However, when I use parfeval, only one additional worker is utilized, leading to slower loading times (about 30-40 seconds per 4DCT). Also it causes lag in the main thread, even though I expect multiple workers to be employed for the loading process and the rest available for my main.

Here are the server specs I have access to:

CPU(s): 24
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2

Memory: 70GB

I've attempted to limit the number of workers to 8 in the parpool, leaving 4 cores free to handle the main GUI thread, but the lag persists.

It's possible that the issue lies with the usage of parfor within parfeval, which might not be leveraging all available workers as expected?

Edit: Using parfor within a parfeval is not possible, parfor will be processed as a normal for-loop if called by parfeval.

Upvotes: 2

Views: 291

Answers (2)

Edric
Edric

Reputation: 25160

You certainly can use parfor in the functions invoked using parfeval. It is difficult to do this in a non-trivial manner in R2019b though. In later releases of MATLAB (>=R2020a), you can use a thread pool inside each local process worker. It's awkward to set up though, and even then there might be challenges getting everything you need running on thread-pool workers.

The approach I would explore is converting your select_data so that you can run it directly in chunks inside parfeval. I.e. instead of something like this

function out = select_data(in)
    parfor i = 1:n
        out(i) = dostuff(i);
    end
end

You somehow work out how many iterates you need and then effectively call the guts of the parfor loop directly from the client via parfeval, like this:

for i = 1:n
   fut(i) = parfeval(@dostuff, 1, i);
end
fetchOutputs(fut);

This also would let you use e.g. afterAll on the array of futures to do something else asynchronously once the work is all finished.

Upvotes: 4

Adriaan
Adriaan

Reputation: 18187

I expect nesting parallel structures to not yield any additional speed-up. You can assign a number of workers to the outer parallellisation, which doesn't leave any room to use even more workers from outside the current, already parallel, outer level. In any case, it's recommended to parallelise the outermost loop in case of nested loops, I expect the same to hold true for nesting other parallellisation structures.

In case your outer structure only contains two elements (i.e. your idx is 2), you might be better off by serially evaluating that loop and using parallellisation on the reading, since you've already verified that that does indeed use more workers and thus reduces execution time.

There's a bunch of background links on how and when to leverage parallellisation in this answer of mine as well as this one.

There's an example in the parfeval docs that updates a UI, that might be worth a try. Alternatively, you could try spmd() to parallellise your reading and assign a limited number of workers to each SPMD. Given you have two sockets with 6 workers each, I'd set the number of workers to 6, leaving a single socket for the main thread:

parpool(6)
spmd
    your_code()
end

Upvotes: 2

Related Questions