Griff
Griff

Reputation: 2124

speed up program by doing operation inside or outside of loop

I just have a question of fortran optimisation (probably programs in general):

There are two ways to carry out a basic operation, over the entire vector or row by row, i.e.

x = array(:,1)
y = array(:,2)
z = array(:,3)

x1 = floor(x/k) + 1
y1 = floor(y/k) + 1
z1 = floor(z/k) + 1

OR

do i = 1:n
   x1(i) = floor(x(i)/k) + 1
   y1(i) = floor(y(i)/k) + 1
   z1(i) = floor(z(i)/k) + 1
end do

I can do openmp on the loop because there are 100 million entries but I'm not sure it would hlep. Would it be faster to do it in the loop or outside of the loop. Experience and common sense tells me to do it outside. There are other components to the program but I'm finding most of the time is taken up by creating new vectors x1,y1,z1 because there are so many x,y,z values to convert.

Upvotes: 1

Views: 120

Answers (2)

High Performance Mark
High Performance Mark

Reputation: 78324

If you're concerned with execution speed then I suggest you profile a version of the code which dispenses with what seem to be the temporary array slices x,y, and z. Creating them will require copying a lot of stuff around the memory of your machine. You could simply write

x1 = floor(array(:,1)/k) + 1
y1 = floor(array(:,2)/k) + 1
z1 = floor(array(:,3)/k) + 1

Your compiler ought to be able to do this without making a copy of array but this is something you ought to check.

Depending on elements of your code which are not shown in your question you might even be able to declare x1,y1 and z1 to be pointers and write something like this:

array_over_k = floor(array/k) + 1
x1 => array_over_k(:,1)
y1 => array_over_k(:,2)
z1 => array_over_k(:,3)

Whichever way you do the calculations you still gotta do the calculations, but do you need to make all those copies of elements of the arrays ?

Upvotes: 2

This will be memory bandwidth bound. I would go the first way, if they are separate in memory (i.e. not some weird non-contiguous pointers). But it's best to try and measure, without profiler one can be wrong easily. Also, you can do OpenMP or just autoparallelization for the first version as well.

Upvotes: 0

Related Questions