Reputation:

Vectorization when mapping between indices in an assignment is not injective

Suppose that c is a scalar value, T and W are M-by-N matrices, k is another M-by-N matrix containing values from 1 to M (and there are at least two pairs (i1, j1), (i2, j2) such that k(i1, j1)==k(i2, j2)) and a is a 1-by-M vector. I want to vectorize the following code (hoping that this will speed it up):

T = zeros(M,N);
for j = 1:N
    for i = 1:M
        T(k(i,j),j) = T(k(i,j),j) + c*W(i,j)/a(i);
    end
end

Do you have any tips so that I can vectorize this code (or make it faster in general)?

Thanks in advance!

Upvotes: 2

Answers (3)

gnovice

Reputation: 125864

Since k only ever effects how values are aggregated within a column, but not between columns, you can achieve a slight speedup by reducing the problem to a single loop over columns and using accumarray like so:

T = zeros(M, N);
for col = 1:N
  T(:, col) = accumarray(k(:,col), c*W(:, col)./a, [M 1]);
end

I tested each of the solutions (the loop in your question, rahnema's, Divakar's, and mine) by taking the average of 100 iterations using input values initialized as in Divakar's answer. Here's what I got (running Windows 7 x64, 16 GB RAM, MATLAB R2016b):

solution | avg. time (s) | max(abs(err)) 
---------+---------------+---------------
loop     |       0.12461 | 0 
rahnema  |       0.84518 | 0 
divakar  |       0.12381 | 1.819e-12 
gnovice  |       0.09477 | 0

The take-away: loops actually aren't so bad, but if you can simplify them into one it can save you a little time.

Upvotes: 2

Divakar

Reputation: 221584

Here's an approach with a combination of bsxfun and accumarray -

% Create 2D array of unique IDs along each col to be used as flattened subs
id = bsxfun(@plus,k,M*(0:N-1));

% Compute "c*W(i,j)/a(i)" for all i's and j's
cWa = c*bsxfun(@rdivide,W,a);

% Accumulate final result for all cols
out = reshape(accumarray(id(:),reshape(cWa,[],1),[M*N 1]),[M,N]);

Benchmarking

Approaches as functions -

function out = func1(W,a,c,k,M,N)
id = bsxfun(@plus,k,M*(0:N-1));
cWa = c*bsxfun(@rdivide,W,a);
out = reshape(accumarray(id(:),reshape(cWa,[],1),[M*N 1]),[M,N]);

function T = func2(W,a,c,k,M,N)  % @rahnema1's solution
[I J] = meshgrid(1:M,1:N);
idx1 = sub2ind([M N], I ,J);
R = c.* W(idx1) ./ a(I);
T = accumarray([k(idx1(:)) ,J(:)], R(:),[M N]);

function T = func3(W,a,c,k,M,N)  % Original approach
T = zeros(M,N);
for j = 1:N
    for i = 1:M
        T(k(i,j),j) = T(k(i,j),j) + c*W(i,j)/a(i);
    end
end

function T = func4(W,a,c,k,M,N) % @gnovice's solution
T = zeros(M, N);
for col = 1:N
  T(:, col) = accumarray(k(:,col), c*W(:, col)./a, [M 1]);
end

Machine setup : Kubuntu 16.04, MATLAB 2012a, 4GB RAM.

Timing code -

% Setup inputs
M = 3000;
N = 3000;
W = rand(M,N);
a = rand(M,1);
c = 2.34;
k = randi([1,M],[M,N]);

disp('------------------ With func1')
tic,out = func1(W,a,c,k,M,N);toc
clear out

disp('------------------ With func2')
tic,out = func2(W,a,c,k,M,N);toc
clear out

disp('------------------ With func3')
tic,out = func3(W,a,c,k,M,N);toc
clear out

disp('------------------ With func4')
tic,out = func4(W,a,c,k,M,N);toc

Timing code run -

------------------ With func1
Elapsed time is 0.215591 seconds.
------------------ With func2
Elapsed time is 1.555373 seconds.
------------------ With func3
Elapsed time is 0.572668 seconds.
------------------ With func4
Elapsed time is 0.291552 seconds.

Possible improvements in proposed approach

1] In c*bsxfun(@rdivide,W,a), we are use two stages of broadcasting - One at bsxfun(@rdivide,W,a), where a is broadcasted ; Second one when c is broadcasted to match-up against the 2D output of bsxfun(@rdivide,W,a), though we don't need bsxfun for this one. So, a possible improvement would be if we insert-in c to be divided by a, where c would be only broadcasted to 1D, instead of 2D and then the second level of broadcasting would be1D: c/a to 2D : W just like before. This minor improvement could be timed -

>> tic, c*bsxfun(@rdivide,W,a); toc
Elapsed time is 0.073244 seconds.
>> tic, bsxfun(@times,W,c/a); toc
Elapsed time is 0.041745 seconds.

But, in cases where c and a differ by a lot, the scaling factor c/a would affect the final result by appreciably. So, one need to be careful with this suggestion.

Upvotes: 2

rahnema1

Reputation: 15837

A possible solution:

[I J] = meshgrid(1:M,1:N);
idx1 = sub2ind([M N], I ,J);
R = c.* W(idx1) ./ a(I);
T = accumarray([K(idx1(:)) ,J(:)], R(:),[M N]);

Comparison of different methods in Octave without jit:

------------------ Divakar
Elapsed time is 0.282008 seconds.
------------------ rahnema1
Elapsed time is 1.08827 seconds.
------------------ gnovice
Elapsed time is 0.418701 seconds.
------------------ loop
doesn't complete in 15 seconds.

Upvotes: 1

Vectorization when mapping between indices in an assignment is not injective

Answers (3)

Benchmarking

Related Questions