Reputation: 2825
I created a function to compute the correct number of ks for a dataset using the Gap Statistics algorithm. This algorithm requires at one point to compute the dispersion (i.e., the sum of the distances between every point and its centroid) for, let's say, 100 different datasets (called "test data(set)" or "reference data(set)"). Since these operations are independent I want to parallel them across all the cores. I have the Mathworks' Parallel Toolbox but I am not sure how to use it (problem 1; I can use past threads to understand this, I guess). However, my real problem is another one: this toolbox seems to allow the usage of just 12 cores (problem 2). My machine has 64 cores and I need to use all of them. Do you know how to parallel a process among 12+ cores?
For your information this is the bit of code that should run in parallel:
%This cycle is repeated n_tests times where n_tests is equal
%to the number of reference datasets we want to use
for id_test = 2:n_tests+1
test_data = generate_test_data(data);
%% Calculate the dispersion(s) for the generated dataset(s)
dispersions(id_test, 1:1:max_k) = zeros;
%We calculate the dispersion for the id_test reference dataset
for id_k = 1:1:max_k
dispersions(id_test, id_k) = calculate_dispersion(test_data, id_k);
end
end
Upvotes: 1
Views: 3239
Reputation: 25140
Please note that in R2014a the limit on the number of local workers was removed. See the release notes.
Upvotes: 5
Reputation: 2699
The number of local workers available with Parallel Computing Toolbox is license dependent. When introduced, the limit was 4; this changed to 8 in R2009a; and to 12 in R2011b.
If you want to use 16 workers, you will need a 16-node MDCS licence, and you'll also need to set up some sort of scheduler to manage those. There are detailed instructions about how to do this here:http://www.mathworks.de/support/product/DM/installation/ver_current/. Once you've done that, yes, you'll be able to do "matlabpool open 16".
EDIT: As of Matlab version R2014a there is no longer a limit on the number of local workers for the Parallel Computing Toolbox. That is, if you are using an up-to-date version of Matlab you will not encounter the problem described by the OP.
Upvotes: 3
Reputation: 1166
I had the same problem on 32 core machine and 6 datasets. I've overcame this by creating shell script, which started matlab six times, one for each data set. I could do this, becase the computations weren't dependent. From what I understand, You could use similar approach. By starting around 6 instances, each counting around 16 datasets. It depends how much RAM you have and how much each instance consumes.
Upvotes: 0
Reputation: 1276
The fact that matlab creates this restriction on its parallel toolbox make it often not worth the money and effort of using it. One way of solving is by using a combination of the matlab compiler and virtual machines using either vmware or virtual box.
This method is time consuming and only worth it if it saves more time than porting the code and the code is already highly optimised.
Upvotes: 1