Edgar Derby
Edgar Derby

Reputation: 2825

Parallel Processing in MATLAB with more than 12 cores

I created a function to compute the correct number of ks for a dataset using the Gap Statistics algorithm. This algorithm requires at one point to compute the dispersion (i.e., the sum of the distances between every point and its centroid) for, let's say, 100 different datasets (called "test data(set)" or "reference data(set)"). Since these operations are independent I want to parallel them across all the cores. I have the Mathworks' Parallel Toolbox but I am not sure how to use it (problem 1; I can use past threads to understand this, I guess). However, my real problem is another one: this toolbox seems to allow the usage of just 12 cores (problem 2). My machine has 64 cores and I need to use all of them. Do you know how to parallel a process among 12+ cores?

For your information this is the bit of code that should run in parallel:

%This cycle is repeated n_tests times where n_tests is equal
%to the number of reference datasets we want to use
for id_test = 2:n_tests+1

test_data = generate_test_data(data);

%% Calculate the dispersion(s) for the generated dataset(s)

dispersions(id_test, 1:1:max_k) = zeros;

%We calculate the dispersion for the id_test reference dataset
for id_k = 1:1:max_k
    dispersions(id_test, id_k) = calculate_dispersion(test_data, id_k);
end
end

Upvotes: 1

Views: 3239

Answers (4)

Edric
Edric

Reputation: 25140

Please note that in R2014a the limit on the number of local workers was removed. See the release notes.

Upvotes: 5

Tik0
Tik0

Reputation: 2699

The number of local workers available with Parallel Computing Toolbox is license dependent. When introduced, the limit was 4; this changed to 8 in R2009a; and to 12 in R2011b.

If you want to use 16 workers, you will need a 16-node MDCS licence, and you'll also need to set up some sort of scheduler to manage those. There are detailed instructions about how to do this here:http://www.mathworks.de/support/product/DM/installation/ver_current/. Once you've done that, yes, you'll be able to do "matlabpool open 16".

EDIT: As of Matlab version R2014a there is no longer a limit on the number of local workers for the Parallel Computing Toolbox. That is, if you are using an up-to-date version of Matlab you will not encounter the problem described by the OP.

Upvotes: 3

JaKu
JaKu

Reputation: 1166

I had the same problem on 32 core machine and 6 datasets. I've overcame this by creating shell script, which started matlab six times, one for each data set. I could do this, becase the computations weren't dependent. From what I understand, You could use similar approach. By starting around 6 instances, each counting around 16 datasets. It depends how much RAM you have and how much each instance consumes.

Upvotes: 0

Philliproso
Philliproso

Reputation: 1276

The fact that matlab creates this restriction on its parallel toolbox make it often not worth the money and effort of using it. One way of solving is by using a combination of the matlab compiler and virtual machines using either vmware or virtual box.

  1. Compile the code required to run your tests.
  2. Load your compiled code with the MCR(matlab compiler runtime)on a VM template.
  3. Create multiple copies of the VM template, let each template run the required calculations for some of the datasets.
  4. Gather the data of all your results

This method is time consuming and only worth it if it saves more time than porting the code and the code is already highly optimised.

Upvotes: 1

Related Questions