Jose Vivas
Jose Vivas

Reputation: 13

vectorizing summation with double for loops in matlab

I have two for loops for very large arrays (10k x 10k) or more. Obviously this program's part is a huge bottleneck and very time-consuming task.

There are 4 arrays: vm(10000,1), va(10000,1), yr(10000,10000), and yi(10000,10000)

for i = 1: 10000
    psum = 0;
    for j = 1: 10000
    psum = psum + vm(i)*vm(j)*(yr(i,j)*cos(va(i)-va(j)) + yi(i,j)*sin(va(i)-va(j)));
    end
pcal(i) = psum;
end

Upvotes: 1

Views: 152

Answers (3)

rahnema1
rahnema1

Reputation: 15867

you can reformulate your equation according to theses trigonometric identities:

sin(a-b) = sin a cos b - cos a sin b;
cos(a-b) = cos a cos b + sin a sin b;

so precompute sine and cosine and use them in loop or bsxfun. this is loop version:

yr = rand(10000);
yi = rand(10000);
va = rand(1,10000);
vm = rand(1,10000);
sin_va = sin(va);
cos_va = cos(va);
for i = 1: 10000
    pcal(i) =  sum(vm(i)*vm.*(yr(i,:).*(cos_va(i) * cos_va + sin_va(i) * sin_va) + yi(i,:).*(sin_va(i) * cos_va - cos_va(i) * sin_va)));
end

Upvotes: 0

EBH
EBH

Reputation: 10450

The fastest option will be @Joans answer. However, in case you run into memory problems, here is a semi-vectorized option (only one loop):

pcal = zeros(N,1); % N is the 10000 in your example
for m = 1: N
    va_va = va(m)-va(1:N);
    pcal(m) = sum(vm(m)*vm(1:N).*(yr(m,1:N).'.*cos(va_va)+yi(m,1:N).'.*sin(va_va)));
end

And here is a benchmarking of this method along with yours and @Joans one, and another method using ndgrid:

function sum_time
N = 10000;
vm = rand(N,1);
va = rand(N,1);
yr = rand(N);
yi = rand(N);

loop_time = timeit(@() loop(N,vm,va,yr,yi))
loop2_time = timeit(@() loop2(N,vm,va,yr,yi))
bsx_time = timeit(@() bsx(vm,va,yr,yi))
ndg_time = timeit(@() ndg(N,vm,va,yr,yi))
end

function pcal = loop(N,vm,va,yr,yi)
pcal = zeros(N,1);
for m = 1: N
    psum = 0;
    for n = 1: N
        psum = psum + vm(m)*vm(n)*(yr(m,n)*cos(va(m)-va(n)) +...
            yi(m,n)*sin(va(m)-va(n)));
    end
    pcal(m) = psum;
end
end

function pcal = loop2(N,vm,va,yr,yi)
pcal = zeros(N,1);
for m = 1: N
    va_va = va(m)-va(1:N); % to avoid calculating twice
    pcal(m) = sum(vm(m)*vm(1:N).*(yr(m,1:N).'.*cos(va_va)+yi(m,1:N).'.*sin(va_va)));
end
end

function pcal = bsx(vm,va,yr,yi)
pcal = sum(bsxfun(@times,vm,vm') .* (...
    yr.*cos(bsxfun(@minus,va,va')) + ...
    yi.*sin(bsxfun(@minus,va,va'))),2);
end

function pcal = ndg(N,vm,va,yr,yi)
[n,m] = ndgrid((1:N).',1:N);
yr_t = yr.';
yi_t = yi.';
va_va = va(m(:))-va(n(:));
vmt = vm(m(:)).*vm(n(:));
psum = vmt.*(yr_t(1:N^2).'.*cos(va_va)+yi_t(1:N^2).'.*sin(va_va));
pcal = sum(reshape(psum,N,N)).';
end

and the results (for N = 10000):

loop_time =
       7.0296
loop2_time =
       3.3722
bsx_time =
       1.2716
ndg_time =
       6.3568

so taking one loop out saving ~50% of the time.

Upvotes: 0

Jonas
Jonas

Reputation: 74940

In your case, it's straightforward to calculate the sum in one go. Basically, you create arrays where the elements are the appropriate products and differences of vm and va, respectively (using bsxfun), followed by element-wise multiplication and summation across the rows.

pcal = sum(bsxfun(@times,vm,vm') .* (...
    yr.*cos(bsxfun(@minus,va,va')) + ...
    yi.*sin(bsxfun(@minus,va,va'))),2);

Note that in vecorization, you tend to trade memory vs CPU cycles. If you do not have enough RAM, you may end up paging, which would slow the vectorized solution to a crawl.

Upvotes: 1

Related Questions