nitowa
nitowa

Reputation: 1107

Paralellized execution in nested for using Cilk

I'm trying to implement a 2D-stencil algorithm that manipulates a matrix. For each field in the matrix, the fields above, below, left and right of it are to be added and divided by 4 in order to calculate the new value. This process may be iterated multiple times for a given matrix.

The program is written in C and compiles with the cilkplus gcc binary.

**Edit: I figured you might interested in the compiler flags:

~/cilkplus/bin/gcc -fcilkplus -lcilkrts -pedantic-errors -g -Wall -std=gnu11 -O3  `pkg-config --cflags glib-2.0 gsl`   -c -o sal_cilk_tst.o sal_cilk_tst.c

Please note that the real code involves some pointer arithmetic to keep everything consistent. The sequential implementation works. I'm omitting these steps here to enhance understandability.

A pseudocode would look something like this (No edge case handling):

for(int i = 0; i < iterations; i++){
   for(int j = 0; j < matrix.width; j++){
      for(int k = 0; k < matrix.height; k++){
         result_ matrix[j][k] = (matrix[j-1][k] + 
                                 matrix[j+1][k] +
                                 matrix[j]  [k+1] +
                                 matrix[j]  [k-1]) / 4;
      }
   }
   matrix = result_matrix;
}

The stencil calculation itself is then moved to the function apply_stencil(...)

for(int i = 0; i < iterations; i++){
   for(int j = 0; j < matrix.width; j++){
      for(int k = 0; k < matrix.height; k++){
         apply_stencil(matrix, result_matrix, j, k);
      }
   }
   matrix = result_matrix;
}

and parallelization is attempted:

for(int i = 0; i < iterations; i++){
   for(int j = 0; j < matrix.width; j++){
      cilk_for(int k = 0; k < matrix.height; k++){ /* <--- */
         apply_stencil(matrix, result_matrix, j, k);
      }
   }
   matrix = result_matrix;
}

This version compiles without errors/warning, but just straight out produces a Floating point exception when executed. In case you are wondering: It does not matter which of the for loops are made into cilk_for loops. All configurations (except no cilk_for) produce the same error.

the possible other method:

for(int i = 0; i < iterations; i++){
   for(int j = 0; j < matrix.width; j++){
      for(int k = 0; k < matrix.height; k++){
         cilk_spawn apply_stencil(matrix, result_matrix, j, k); /* <--- */
      }
   }
   cilk_sync; /* <--- */
   matrix = result_matrix;
}

This produces 3 warnings when compiled: i, j and k appear to be uninitialized. When trying to execute, the function which executes the matrix = result_matrix; step appears to be undefined.

Now for the actual question: Why and how does Cilk break my sequential code; or rather how can I prevent it from doing so?

The actual code is of course available too, should you be interested. However, this project is for an university class and therefore subject to plagiarism from other students who find this thread which is why I would prefer not to share it publicly.

**UPDATE:

As suggested I attempted to run the algorithm with only 1 worker thread, effectively making the cilk implementation sequential. This did, surprisingly enough, work out fine. However as soon as I change the number of workers to two, the familiar errors return.

I don't think this behavior is caused by race-conditions though. Since the working matrix is changed after each iteration and cilk_sync is called, there is effectively no critical section. All threads do not depend on data written by others in the same iteration.

The next steps I will attempt is to try out other versions of the cilkplus compiler, to see if its maybe an error on their side.

Upvotes: 2

Views: 387

Answers (2)

user5688343
user5688343

Reputation: 11

With regards to the floating point exception in a cilk_for, there are some issues that have been fixed in some versions of the Cilk Plus runtime. Is it possible that you are using an outdated version?

https://software.intel.com/en-us/forums/intel-cilk-plus/topic/558825

Also, what were the specific warning messages that are produced? There are some "uninitialized variable" warnings that occur with older versions of Cilk Plus GCC, which I thought were spurious warnings.

Upvotes: 1

The Cilk runtime uses a recursive divide and conquer algorithm to parallelize your loop. Essentially, it breaks the range in half, and recursively calls itself twice, spawning half and calling half.

As part of the initialization, it calculates a "grain size" which is the size of the minimum size it will break your range into. By default, that's loopRange/8P, where P is the number of cores.

One interesting experiment would be to set the number of Cilk workers to 1. When you do this, all of the cilk_for mechanism is excersized, but because there's only 1 worker, nothing gets stolen.

Another possibility is to try running your code under Cilkscreen - the Cilk race detector. Unfortunately only the cilkplus branch of GCC generates the annotations that Cilkscreen needs. Your choices are to use the Intel commpiler, or try using the cilkplus branch of GCC 4.9. Directions on how to pull down the code and build it are at the cilkplus.org website.

Upvotes: 0

Related Questions