Reputation: 593
Problem I am processing roughly 18 million point data sets at the moment that run through different processes. Over the profile viewer I found out that one of my bottlenecks is this part of code and hence I was wondering if it is possible to vectorize multiple if-statements.
Code
WA=zeros(size(NB_list_z,1),3);
for i=1:size(NB_list_z,1);
if (NB_list_z(i,2)==0||NB_list_z(i,3)==0);
WA(i,1)=BMLS(NB_list_z(i,1),5);
else
if (BMLS(NB_list_z(i,3),5)>=COG);
WA(i,1)=(BMLS(NB_list_z(i,3),5)+BMLS(NB_list_z(i,2),5)+BMLS(NB_list_z(i,1),5))/3;
if (WA(i,1)<COG);
if (BMLS(NB_list_z(i,2),5)>=COG);
WA(i)=(BMLS(NB_list_z(i,2),5)+BMLS(NB_list_z(i,1),5))/2;
if (WA(i,1)<COG);
WA(i,1)=BMLS(NB_list_z(i,1),5);
end
else
WA(i,1)=BMLS(NB_list_z(i,1),5);
end
end
else
if (BMLS(NB_list_z(i,2),5)>=COG);
WA(i,1)=(BMLS(NB_list_z(i,2),5)+BMLS(NB_list_z(i,1),5))/2;
if (WA(i,1)<COG);
WA(i,1)=BMLS(NB_list_z(i,1),5);
end
else
WA(i,1)=BMLS(NB_list_z(i,1),5);
end
end
end
end
Code Description
NB_list_z
contains the indexes of the neighbors of the points in the first column (in z direction -every point can have up to two points above.)
BMLS
contains the values for the threshold I want to check.
COG
is the threshold value.
Consider the lowest block = Block1 , the one above = Block2 and the one over as Block3.
The first if-clause sets the value to the value of Block1 if there are no neighbors existing above.
After that I want to combine blocks in the most profitable way for me. Meaning that if Blocks 3+2+1 are above the threshold I want to include them all, but the highest block (here block 3) always has to be over the threshold alone as well. If not then 2+1 with the same conditions and if not then only 1. The code above works perfectly fine on small data sets but starts to take a lot of time for bigger data sets.
Question
I am new to "code optimization" and "vectorization" in that sense that I only started with it. I found some entries about removing for-loops and the like but I couldn't find anything to remove or simply multiple if-clauses. Hence the question is it possible to vectorize nested if-clauses ?
Upvotes: 1
Views: 582
Reputation: 4558
The code is a bit too long to rewrite on this forum and definitely too long too rewrite without testing in against test data with expected output. However, let me write a bit about "vectorization" instead.
What is vectorization?
So, when we talk about vectorization in MATLAB we commonly mean that we apply certain operations on a vector instead of each element in the vector. A bit oversimplyfied we can see it as if we, instead of applying an operation on each element in the vector, using function taking a vector as input instead. For this to be effective, the operation need to have MATLAB support. What I mean is that the heavy work should be performed by a compiled file (mex-file).
How is is done?
When you want to apply this to all elements in a vector, it is really simple. For example, instead of doing,
a = 1:2:20;
total = 0;
for k = a %(range-based)
total = total + a;
end
%for ind = 1:length(a) %(same result)
% total = total + a(ind);
%end
it is possible to do it like this instead,
a = 1:2:20;
total = sum(a);
In case there you have an if statement in the loop it is still possible to vectorize this. Assume you want to sum all elements smaller than 11 and larger than 11 separately,
a = 1:2:20
total1 = sum(a(a<11));
total2 = sum(a(a>11));
However, in case you have nested if statements it gets more complicated. You will likely need to split the operation in a number of expressions. Each branch of the if statement needs to be handled separately. Each nested if statement need will be seen as a subset of the outer if statement. Thus it can be handled using and
(&
).
b = rand(10);
c = zeros(10);
c(b<0.5) = 0;
c(b>=0.5 & b<0.8) = 2*(c(b>=0.5 & b<0.8).^2);
c(b>=0.8) = 1;
When do I vectorize
It may not be worth vectorizing if a function is used only a few times and execute "sufficiently fast". After this it becomes a trade-off between complexity and efficiency. A function executing in a 100th of a second may still need optimizing if it is called 10000 times during an execution. Normally the more general functions needs optimization since these seems to attract a higher number of function calls. Also in case you run nested for loops where there is a dependency between the loops, these functions tend to be hard to vectorize.
a = 2:2:20;
for (m=1:length(a))
for (n=1:length(a))
if (m~=n)
a(n) = a(n)/2;
end
end
a(a>5) = 2*a(a>5);
end
This becomes quite complicated, where the inner loop depends on the specific iteration of the outer loop. It may still be possible to solve, but you will have a problem similar to finding an orthogonal parametrization to a double integral. In case it is not absolutely necessary it may not be worth the effort and even if it is crucial to vectorize this it may still be worth to redefine the problems in terms of a more vectorizable manner than vectorizing these loops.
Some last words
Note that for large data sets a vectorization may generate copies of a large number of elements. Make sure that you are not modifing the input to a function since Matlab uses copy-on-write.
Upvotes: 2