Reputation: 29096
I've read on THIS comment on SO that Matlab is no longer slow at for loops (c.f. link).
I used to use Matlab quite a lot during my studies and I remember how much time I saved by always finding a solution that not involves excessive loops (by using reshape
, repmat
or either arrayfun
).
So this article above caught my attention and I quickly wrote this:
clear all; T = linspace(0,1,1e6);
tic
i = 0;
for t = T
i = i + 1; y(i) = sin(t);
end
toc
clear all; T = linspace(0,1,1e6);
tic
i = 0;
y = zeros(numel(T), 1);
for t = T
i = i + 1; y(i) = sin(t);
end
toc
clear all; T = linspace(0,1,1e6);
tic
y = sin(T);
toc
which outputs this:
Elapsed time is 1.741640 seconds.
Elapsed time is 1.400412 seconds.
Elapsed time is 0.004076 seconds.
I also tried to toggle the accel
feature...
>feature accel on
But each time, even for more complex matrix manipulations, the vectorized version that uses native Matlab functions is always faster.
Perhaps I am missing some important point or I am just still right with my opinion: with Matlab we should always avoid loops as much as possible.
Now, I am looking for a counterexample.
Upvotes: 2
Views: 126
Reputation: 221524
Few examples could be suggested to study for-loop
versus vectorization
for performance.
This is just a very basic computation of calculating sine of a number of elements. This count of elements was varied to assess the problem in hand. Inspired by this screenshot link .
Benchmarking Code
num_runs = 1000;
N_arr = [ 1000 10000 100000 1000000];
%// Warm up tic/toc.
for k = 1:100
tic(); elapsed = toc();
end
for k = 1:numel(N_arr)
N = N_arr(k);
tic
for runs=1:num_runs
out_f1 = zeros(1,N);
for t = 1:N
out_f1(t) = sin(t);
end
end
t_forloop = toc/num_runs;
tic
for runs=1:num_runs
out_v1 = sin(1:N);
end
t_vect = toc/num_runs;
end
Results
----------- Datsize(N) = 1000 -------------
Elapsed time with for-loops - 7.1826e-05
Elapsed time with vectorized code - 8.3601e-05
----------- Datsize(N) = 10000 -------------
Elapsed time with for-loops - 0.00068531
Elapsed time with vectorized code - 0.00045043
----------- Datsize(N) = 100000 -------------
Elapsed time with for-loops - 0.0074613
Elapsed time with vectorized code - 0.0053368
----------- Datsize(N) = 1000000 -------------
Elapsed time with for-loops - 0.077707
Elapsed time with vectorized code - 0.053255
Please note that these results were coherent with timeit
results (code and results of those aren't shown here).
Conclusions
for-loops
as quickly as 10000
elements cases.Let's consider a case of using an array of elements inside each iteration of for-loop. Let it store sine
, cosine
, tan
and sec
into one column in each iteration, i.e. [sin(t) ; cos(t) ; tan(t) ; sec(t)]
.
For-loop code would be -
out_f1 = zeros(4,N);
for t = 1:N
out_f1(:,t) = [sin(t) ; cos(t) ; tan(t) ; sec(t)];
end
Vectorized code -
out_v1 = [sin(1:N); cos(1:N) ; tan(1:N); sec(1:N)];
Results
----------- Datsize(N) = 100 -------------
Elapsed time with for-loops - 0.00011861
Elapsed time with vectorized code - 6.0569e-05
----------- Datsize(N) = 1000 -------------
Elapsed time with for-loops - 0.0011867
Elapsed time with vectorized code - 0.00036786
----------- Datsize(N) = 10000 -------------
Elapsed time with for-loops - 0.011819
Elapsed time with vectorized code - 0.0025536
----------- Datsize(N) = 1000000 -------------
Elapsed time with for-loops - 1.2329
Elapsed time with vectorized code - 0.33383
Modified case
One could easily jump into the conclusion that for-loop doesn't stand a chance here. But wait, how about we do element-wise assignment again as in example #1 for for-loop case, like this -
out_f1 = zeros(4,N);
for t = 1:N
out_f1(1,t) = sin(t);
out_f1(2,t) = cos(t);
out_f1(3,t) = tan(t);
out_f1(4,t) = sec(t);
end
Now, this uses spatial locality, so a competitive vectorized code using the same would be -
out_v1 = [sin(1:N) cos(1:N) tan(1:N) sec(1:N)]';
The benchmark results with these modified codes for this testcase were -
----------- Datsize(N) = 100 -------------
Elapsed time with for-loops - 3.1987e-05
Elapsed time with vectorized code - 6.9778e-05
----------- Datsize(N) = 1000 -------------
Elapsed time with for-loops - 0.00027976
Elapsed time with vectorized code - 0.00036804
----------- Datsize(N) = 10000 -------------
Elapsed time with for-loops - 0.0029712
Elapsed time with vectorized code - 0.0024423
----------- Datsize(N) = 100000 -------------
Elapsed time with for-loops - 0.031113
Elapsed time with vectorized code - 0.028549
----------- Datsize(N) = 1000000 -------------
Elapsed time with for-loops - 0.32636
Elapsed time with vectorized code - 0.28063
Conclusions
The latter benchmark results seem to prove again that for upto 10000
elements for-loop wins and after that vectorized solutions would be preferred. But it must be noted that this came at the expense of writing element-wise assignments.
Upvotes: 2
Reputation: 6060
The problem is what different people consider "slow".
When MATLAB for loops go from "unbelievably abysmally slow" to "8 times slower than the vectorized version" there will be
In my opinion, MATLAB is still slow at loops (guess I'm group three) and you should vectorize whenever possible (unless the readability suffers). Just because it was even slower in the past, does not make the current performance better.
Also, MATLAB has got some other weak spots: https://stackoverflow.com/a/17933146/1974021
Upvotes: 4
Reputation: 36710
Use a real loop index and the jit-compiler understands your loop:
clear all; T = linspace(0,1,1e6);
tic
y = zeros(numel(T), 1);
for idx=1:numel(T)
y(idx) = sin(T(idx));
end
toc
Such code is much faster. The optimisations are based on code analyse, write clear code and give matlab a chance to successfully analyse it ;)
Upvotes: 1