roni
roni

Reputation: 1454

Optimizing a program by vectorized notation

Hi all I am working on Image processing and have written a short piece of code in MATLAB. The code is quite slow.

I am giving my code snippet here

for i=1:10
    //find c1,c2,c3
    //c1 c2 and c3 change at each iteration

    u = (1./((abs(P-c1))^m) + 1./((abs(P-c2))^m) + 1./((abs(P-c3))^m));
    u1 = 1./((abs(P-c1))^m)./u;
    u2 = 1./((abs(P-c2))^m)./u;    
    u3 = 1./((abs(P-c3))^m)./u;
end

Let me explain the variables here:

P,u,u1,u2 and u3 are all matrices of size 512x512
c1,c2 and c3 are constants of dimension 1x1
m is a constant with value = 2

I want to repeat this operations in a loop (say 10 times). However my code is quite slow.

The results of the profiler are given below :

The total running time of the program was 4.6 secs. However the four steps listed above itself takes abour 80% of the time.

enter image description here

So I wanted to make my code run faster. MY FIRST EDIT

My changed code snippet

for i=1:10

    //find c1 and c2
    //c1 and c2 changes at each iteration

    a=((abs(P-c1))^m); 
    b=((abs(P-c2))^m); 
    c=((abs(P-c3))^m);

    x=1./a; y=1./b; z=1./c;
    u = (x + y + z);
    u1 = x./u;
    u2 = y./u;    
    u3 = z./u;
end

Now the program computes in 2.47 seconds computation time for the above steps are given below:

enter image description here

So this is way much more faster than my first method.

2nd edit for i=1:10 //find c1,c2,c3 //c1 c2 and c3 change at each iteration

    a=(P-c1).*(P-c1); 
    b=(P-c2).*(P-c2); 
    c=(P-c3).*(P-c3);

    x=1./a; y=1./b; z=1./c;
    u = (x + y + z);
    u1 = x./u;
    u2 = y./u;    
    u3 = z./u;
end

Now the program computes in 0.808 seconds.

The four steps described above computes above very quickly.

enter image description here

I am sure it can be made even faster. Can you guys please help me to further optimize my code. It would be extremely helpful for matrices larger size than 512 such as 1024 , 2048 or likewise.

Thanks in advance.

Upvotes: 4

Views: 89

Answers (2)

NKN
NKN

Reputation: 6434

The first suggestion is:

if m = 2 and it is not changing, why you don't try these alternatives:

A*A

and if m = 2 then do you really need abs ?

this part that you are doing

1./a

is faster than

a.^(-1)

so I don't see any better option in this part.

Another thing you can try is this. instead of:

x=1./a; y=1./b; z=1./c;
    u = (x + y + z);
    u1 = x./u;
    u2 = y./u;    
    u3 = z./u;

You can have this:

    u = (x + y + z);
    u1 = 1./(a.*u);
    u2 = 1./(b.*u);    
    u3 = 1./(c.*u);

this way I guess it is a little bit faster by removing 3 variables. but the code becomes less readable.

Upvotes: 1

Nitish
Nitish

Reputation: 7186

Your current code is:

a=((abs(P-c1))^m); 
b=((abs(P-c2))^m); 
c=((abs(P-c3))^m);

x=1./a; y=1./b; z=1./c;
u = (x + y + z);
u1 = x./u;
u2 = y./u;    
u3 = z./u;

Firstly, realize that the absolute value function is multiplicative. So |AB| = |A|x|B|. Now, abs(P-C1)^m is equivalent to abs( (P-C1)^m ).

Just a preliminary glance at it suggests that some of the computation in the bottleneck can be reused. Specifically, since c1,c2 and c3 are constants, the computation can be sped up a little bit if you try to reuse them (at the expense of additional memory).

temp_P2 = P*P;
temp_PCA = P*ones(size(P));
temp_PCB = ones(size(P))*P;

a = abs(temp_P2 - c1*temp_PCA - c1*temp_PCB + c1^2 * length(P))

The computation of temp_PCA and temp_PCB can also be avoided since multiplication by a constant matrix always amounts to the construction of a rank 1 matrix with either constant rows or columns.

I don't claim that any of these modifications will speed up your code but they are definitely worth trying.

Upvotes: 1

Related Questions