Reputation: 13
I have two for loops for very large arrays (10k x 10k) or more. Obviously this program's part is a huge bottleneck and very time-consuming task.
There are 4 arrays: vm(10000,1)
, va(10000,1)
, yr(10000,10000)
, and yi(10000,10000)
for i = 1: 10000
psum = 0;
for j = 1: 10000
psum = psum + vm(i)*vm(j)*(yr(i,j)*cos(va(i)-va(j)) + yi(i,j)*sin(va(i)-va(j)));
end
pcal(i) = psum;
end
Upvotes: 1
Views: 152
Reputation: 15867
you can reformulate your equation according to theses trigonometric identities:
sin(a-b) = sin a cos b - cos a sin b;
cos(a-b) = cos a cos b + sin a sin b;
so precompute sine and cosine and use them in loop or bsxfun. this is loop version:
yr = rand(10000);
yi = rand(10000);
va = rand(1,10000);
vm = rand(1,10000);
sin_va = sin(va);
cos_va = cos(va);
for i = 1: 10000
pcal(i) = sum(vm(i)*vm.*(yr(i,:).*(cos_va(i) * cos_va + sin_va(i) * sin_va) + yi(i,:).*(sin_va(i) * cos_va - cos_va(i) * sin_va)));
end
Upvotes: 0
Reputation: 10450
The fastest option will be @Joans answer. However, in case you run into memory problems, here is a semi-vectorized option (only one loop):
pcal = zeros(N,1); % N is the 10000 in your example
for m = 1: N
va_va = va(m)-va(1:N);
pcal(m) = sum(vm(m)*vm(1:N).*(yr(m,1:N).'.*cos(va_va)+yi(m,1:N).'.*sin(va_va)));
end
And here is a benchmarking of this method along with yours and @Joans one, and another method using ndgrid
:
function sum_time
N = 10000;
vm = rand(N,1);
va = rand(N,1);
yr = rand(N);
yi = rand(N);
loop_time = timeit(@() loop(N,vm,va,yr,yi))
loop2_time = timeit(@() loop2(N,vm,va,yr,yi))
bsx_time = timeit(@() bsx(vm,va,yr,yi))
ndg_time = timeit(@() ndg(N,vm,va,yr,yi))
end
function pcal = loop(N,vm,va,yr,yi)
pcal = zeros(N,1);
for m = 1: N
psum = 0;
for n = 1: N
psum = psum + vm(m)*vm(n)*(yr(m,n)*cos(va(m)-va(n)) +...
yi(m,n)*sin(va(m)-va(n)));
end
pcal(m) = psum;
end
end
function pcal = loop2(N,vm,va,yr,yi)
pcal = zeros(N,1);
for m = 1: N
va_va = va(m)-va(1:N); % to avoid calculating twice
pcal(m) = sum(vm(m)*vm(1:N).*(yr(m,1:N).'.*cos(va_va)+yi(m,1:N).'.*sin(va_va)));
end
end
function pcal = bsx(vm,va,yr,yi)
pcal = sum(bsxfun(@times,vm,vm') .* (...
yr.*cos(bsxfun(@minus,va,va')) + ...
yi.*sin(bsxfun(@minus,va,va'))),2);
end
function pcal = ndg(N,vm,va,yr,yi)
[n,m] = ndgrid((1:N).',1:N);
yr_t = yr.';
yi_t = yi.';
va_va = va(m(:))-va(n(:));
vmt = vm(m(:)).*vm(n(:));
psum = vmt.*(yr_t(1:N^2).'.*cos(va_va)+yi_t(1:N^2).'.*sin(va_va));
pcal = sum(reshape(psum,N,N)).';
end
and the results (for N = 10000
):
loop_time =
7.0296
loop2_time =
3.3722
bsx_time =
1.2716
ndg_time =
6.3568
so taking one loop out saving ~50% of the time.
Upvotes: 0
Reputation: 74940
In your case, it's straightforward to calculate the sum in one go. Basically, you create arrays where the elements are the appropriate products and differences of vm
and va
, respectively (using bsxfun
), followed by element-wise multiplication and summation across the rows.
pcal = sum(bsxfun(@times,vm,vm') .* (...
yr.*cos(bsxfun(@minus,va,va')) + ...
yi.*sin(bsxfun(@minus,va,va'))),2);
Note that in vecorization, you tend to trade memory vs CPU cycles. If you do not have enough RAM, you may end up paging, which would slow the vectorized solution to a crawl.
Upvotes: 1