Reputation: 10176
I can't figure out the problem in the following short script which should compare a single CPU computation with a parallelization concerning computation time.
Link to full image: LINK
The code is:
n = 700;
ranksSingle = zeros(1,n);
tic
for ind = 1:n
ranksSingle(ind) = rank(magic(ind));
end
toc
matlabpool local 4
tic
ranks = zeros(1,n);
parfor (ind = 1:n)
ranks(ind) = rank(magic(ind));
end
toc
isequal(ranksSingle, ranks)
matlabpool close
I also tried it with matlabpool 2
. As you can clearly see from the process window, all cores are busy to 100% when running the parallel computation (marked red).
When running the single-cpu computation (marked blue), strangly the 4 cores are also more busy than before. I would have expected only ONE core to go up. I searched the internet to see, if perhaps the magic()
or rank
function are built-in parallelized, but as you can read from here: http://www.walkingrandomly.com/?p=1894 it's not the case. So it's okay that those 4 cores are not fully busy, but still I'm wondering why ALL cores go up.
Secondly, I really wonder the computation time of the parallelized version. I know there's some sort of overhead by distributing the jobs to the single cores, but this shouldn't be so high that there's no benefit at all in the end :(
Perhaps anybody can tell me something about it :( I'm really stuck at this since I want to speed up some of my for-loops. Second question is, if there's any command to always set the worker size to the number of physical cores I have in my computer? (and also using Hyper Threading if that's an additional benefit?)
Thanks a lot!
Upvotes: 4
Views: 1389
Reputation: 74930
When you want to run a parallel job, you should remember that it's bad to have too many fast iterations, and that it's bad to have too few slow iterations. If you do a million iterations that each take a few miliseconds, the overhead from parallelization will destroy any possible gain. If you do nine iterations that take an hour each, and you run it on eight processors in parallel, seven processors will be idling for an hour waiting for iteration #9 to finish.
Thus, your example is pretty bad for testing the impact of parallelization, because both magic
and rank
are way too fast.
function testParfor2
tic
for i=1:4
pause(1); %# wait for 1 second
end
toc
matlabpool open 4
tic
parfor i=1:4
pause(1); %# wait for 1 second
end
toc
Elapsed time is 4.050287 seconds.
Elapsed time is 1.534534 seconds.
Note that I was running a second parallel job at the same time, but roughly, the result should be reproducible: There is a bit of overhead (note that I didn't count the time used by matlabpool
!), but the speed-up is there. You should be seeing the same amount of overhead if you increase the pause length. Also, you should be testing with your actual loops (try to parallelize the outermost loop, btw).
To your second question:
matlabpool open
Will create as many workers as there are physical cores. Hyperthreading will help you ensure that the computer remains responsive when the parallel job is running.
Finally, while magic
and rank
may not be fully multithreaded by themselves, they may make calls to multithreaded routines.
Upvotes: 5