surya kiran
surya kiran

Reputation: 475

openmp optimization with return condition

I'm new to OpenMP parallel programming and finding it difficult to optimize my function, which should return -1 in case a column is empty.

Here is my function. The source matrix has some values, and norm_matrix is constructed as a diagonal matrix where each column holds the maximum value from the corresponding column of the source matrix.

I'm performing this on matrix size of 1000 For example, if my source matrix is

 3   2   3   3
 2   4   5   5
 1   4  91   8
32  12   9  63 

then the computed norm_matrix will be

32   0   0   0
 0  12   0   0 
 0   0  91   0 
 0   0   0  63 

If any column contains all zeros then the function should return -1.

Here is my function that I'm trying to optimize in OpenMP:

int statistic_norm_matrix(double* source_matrix, double* norm_matrix, int size) {
   int col, row;

   for (col = 0; col < size; col++) {
      norm_matrix[col * size + col] = source_matrix[col];

      for (row = 0; row < size; row++) {
         norm_matrix[col * size + col] = fmax(norm_matrix[col * size + col], source_matrix[row * size + col]);
      }

      if (norm_matrix[col * size + col] == 0) {
         printf("can't process a matrix where the max col value is 0");
         return -1;
      }

      norm_matrix[col * size + col] = (1 / norm_matrix[col * size + col]);

      if (col == size - 1) {
         print_matrix("matrix-norm", norm_matrix, size);
      }
   }
   return 0; 

This is an attempt at parallelizing with OpenMP, but I see no performance difference:

int statistic_norm_matrix(double* source_matrix, double* norm_matrix, int size) {
   int col, row;
    int flag=0;
   
        #pragma omp parallel for shared(source_matrix,norm_matrix) private(col,row) 
        for (col = 0; col < size; col++) {
            norm_matrix[col * size + col] = source_matrix[col];

                for (row = 0; row < size; row++) {
                    norm_matrix[col * size + col] = fmax(norm_matrix[col * size + col], source_matrix[row * size + col]);
                }
        
                if (norm_matrix[col * size + col] == 0) {
                    printf("can't process a matrix where the max col value is 0");
                    flag=-1;
                }

                norm_matrix[col * size + col] = (1 / norm_matrix[col * size + col]);

            }

        print_matrix("matrix-norm", norm_matrix, size);
        return flag;
}

Upvotes: 1

Views: 155

Answers (1)

dreamcrash
dreamcrash

Reputation: 51393

When norm_matrix[col * size + col] == 0 you would get a dividing by zero error on the statement (1 / norm_matrix[col * size + col]);. Assuming that when the flag = -1 you should exit the parallel region completely, you should use #pragma omp cancel for

int statistic_norm_matrix(double* source_matrix, double* norm_matrix, int size) {
        int flag = 0;   
        #pragma omp parallel for shared(source_matrix,norm_matrix)
        for (int col = 0; col < size; col++) {
             norm_matrix[col * size + col] = source_matrix[col];
             for (int row = 0; row < size; row++) {
                 norm_matrix[col * size + col] = fmax(norm_matrix[col * size + col], source_matrix[row * size + col]);
             }
             if (norm_matrix[col * size + col] == 0) {
                    flag = -1;
                    #pragma omp cancel for
             }

             norm_matrix[col * size + col] = (1 / norm_matrix[col * size + col]);
        }

     print_matrix("matrix-norm", norm_matrix, size);
     return flag;
}

I did a quick benchmark on my machine (i.e., with 4 cores) measuring the time of the statistic_norm_matrix alone, without counting the print_matrix, and for a matrix 1000x1000:

1 Thread : Time taken is 0.010852 (s)

2 Threads Time taken is 0.005325 (s)

4 Threads Time taken is 0.002891 (s)

for a matrix 10000x10000:

1 Thread Time taken is 1.937415 (s)

2 Threads Time taken is 1.052908 (s)

4 Threads Time taken is 0.807185 (s)

The tests with 1 thread were done without any openmp directives. So either something is wrong with the way of are compiling/running and your code is not actually running in parallel or something is wrong with the way that you are measuring the times.

Upvotes: 1

Related Questions