michal
michal

Reputation: 339

matlab parallization by parfor

How to parallelize for-cycle in the attached fuction by parfor-cycle to get additional speedup in matlab?

Thanks in advance for any tips?

function p = permryser_fast( A )
% Computes the permament of square matrix, A
%   The matrix permanent is defined like the matrix determinant, but
% without the sign terms.

mlock % locks the current M-file in memory 
% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size

[m n]=size(A);

if (m ~= n),
   error('Must be a square matrix. perm_ryser_fast()  %d %d\n',m,n)
end

if isempty(ps)
   pn=-1;
end

if (n~=pn)&&(n>1),
   fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
   pn=n;
   x=1:(2^n -1);                % count (assumes n<=52)
   y=bitxor(x,bitshift(x,-1));  % gray-coded count
   pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
   % computes which position comes in/out in gray-count
   ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
   % computes whether its in (+1) or out (-1)
end

if (n == 1),
  p = A;
else
  p = 0;  % running permanent accumulator
  rs = zeros(1,n);  % running row sums vector
  % ==============================================
  % Loop over all 2^n subsets of {1,...,n}
  for i=1:(2^n -1) % Just skipping the null subset
    rs = rs + ps(i) * A(pz(i),:);
    p = p + (-1)^i * prod(rs);
  end
  % ==============================================

  p = p * (-1)^m;
end

return

Example:

n = 20;
A = ones(n);
permanent = permryser_fast(A)

In this case the permanent should be equal factorial(n).

Remarks:

  1. vectorization is able to greatly improve computation speed, but memory requirements are so terrible, that realistic sizes of matrices are up to 25 x 25

  2. probably the only way is parallelization via parfor-loop after proper modification of recursion for-loop.

  3. My final goal is to be able compute permanents of non-negative matrices with size from 25x25 to 35x35 in reasonable amount of time and accuracy.

Upvotes: 0

Views: 96

Answers (1)

m.s.
m.s.

Reputation: 16354

Before trying parfor you should try to vectorize your code to get rid of the loop:

function p = permryser_fast_vectorized( A )
% Computes the permament of square matrix, A
%   The matrix permanent is defined like the matrix determinant, but
% without the sign terms.

% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size

[m n]=size(A);

if (m ~= n),
   error('Must be a square matrix. perm_ryser_fast()  %d %d\n',m,n)
end

if isempty(ps)
   pn=-1;
end

if (n~=pn)&&(n>1),
   fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
   pn=n;
   x=1:(2^n -1);                % count (assumes n<=52)
   y=bitxor(x,bitshift(x,-1));  % gray-coded count
   pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
   % computes which position comes in/out in gray-count
   ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
   % computes whether its in (+1) or out (-1)
end

if (n == 1),
  p = A;
else

  % === vectorized version starts here
  k = 1:(2^n -1); 
  ps_times_A = bsxfun(@times,A(pz(k),:),ps(k)');
  rs = cumsum(ps_times_A);
  p = (-1).^k.*prod(rs,2)';
  p = sum(p) * (-1)^m;
  % === vectorized version ends here
end

return

I benchmarked my vectorized version against your code using this benchmark script:

n=20;
A=ones(n);
runs = 10;
% run original loop-based code
tic;
for k=1:runs
    permanent_loop = permryser_fast(A);
end
t_loop = toc/runs;

% run vectorized code
tic;
for k=1:runs
    permanent_vectorized = permryser_fast_vectorized(A);
end
t_vectorized = toc/runs;

fprintf('loop: %f s\n',t_loop);
fprintf('vectorized: %f s\n',t_vectorized);

output

loop: 1.446856 s
vectorized: 0.163842 s

The vectorized version is more than 8x faster.

Upvotes: 1

Related Questions