Reputation: 339
How to parallelize for-cycle in the attached fuction by parfor-cycle to get additional speedup in matlab?
Thanks in advance for any tips?
function p = permryser_fast( A )
% Computes the permament of square matrix, A
% The matrix permanent is defined like the matrix determinant, but
% without the sign terms.
mlock % locks the current M-file in memory
% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size
[m n]=size(A);
if (m ~= n),
error('Must be a square matrix. perm_ryser_fast() %d %d\n',m,n)
end
if isempty(ps)
pn=-1;
end
if (n~=pn)&&(n>1),
fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
pn=n;
x=1:(2^n -1); % count (assumes n<=52)
y=bitxor(x,bitshift(x,-1)); % gray-coded count
pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
% computes which position comes in/out in gray-count
ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
% computes whether its in (+1) or out (-1)
end
if (n == 1),
p = A;
else
p = 0; % running permanent accumulator
rs = zeros(1,n); % running row sums vector
% ==============================================
% Loop over all 2^n subsets of {1,...,n}
for i=1:(2^n -1) % Just skipping the null subset
rs = rs + ps(i) * A(pz(i),:);
p = p + (-1)^i * prod(rs);
end
% ==============================================
p = p * (-1)^m;
end
return
Example:
n = 20;
A = ones(n);
permanent = permryser_fast(A)
In this case the permanent should be equal factorial(n).
Remarks:
vectorization is able to greatly improve computation speed, but memory requirements are so terrible, that realistic sizes of matrices are up to 25 x 25
probably the only way is parallelization via parfor-loop after proper modification of recursion for-loop.
My final goal is to be able compute permanents of non-negative matrices with size from 25x25 to 35x35 in reasonable amount of time and accuracy.
Upvotes: 0
Views: 96
Reputation: 16354
Before trying parfor
you should try to vectorize your code to get rid of the loop:
function p = permryser_fast_vectorized( A )
% Computes the permament of square matrix, A
% The matrix permanent is defined like the matrix determinant, but
% without the sign terms.
% suggest using "munlock('perm_ryser_fast')" at start of calling script
persistent pn pz ps % used to build tables for specific size
[m n]=size(A);
if (m ~= n),
error('Must be a square matrix. perm_ryser_fast() %d %d\n',m,n)
end
if isempty(ps)
pn=-1;
end
if (n~=pn)&&(n>1),
fprintf(1,'Creating table for fast Ryser n=%d %d\n',n,pn)
pn=n;
x=1:(2^n -1); % count (assumes n<=52)
y=bitxor(x,bitshift(x,-1)); % gray-coded count
pz=log2(double(bitxor([0 y(1:2^n-2)],y(1:2^n-1))))+1;
% computes which position comes in/out in gray-count
ps=(-1).^(mod(y./ 2.^(pz-1),2) < 1);
% computes whether its in (+1) or out (-1)
end
if (n == 1),
p = A;
else
% === vectorized version starts here
k = 1:(2^n -1);
ps_times_A = bsxfun(@times,A(pz(k),:),ps(k)');
rs = cumsum(ps_times_A);
p = (-1).^k.*prod(rs,2)';
p = sum(p) * (-1)^m;
% === vectorized version ends here
end
return
I benchmarked my vectorized version against your code using this benchmark script:
n=20;
A=ones(n);
runs = 10;
% run original loop-based code
tic;
for k=1:runs
permanent_loop = permryser_fast(A);
end
t_loop = toc/runs;
% run vectorized code
tic;
for k=1:runs
permanent_vectorized = permryser_fast_vectorized(A);
end
t_vectorized = toc/runs;
fprintf('loop: %f s\n',t_loop);
fprintf('vectorized: %f s\n',t_vectorized);
output
loop: 1.446856 s
vectorized: 0.163842 s
The vectorized version is more than 8x faster.
Upvotes: 1