olabaz
olabaz

Reputation: 21

OpenMP C - Parallel Loop Averaging Gives Different Results

I am trying to parallelize my C program that reads 5000 files with data and averages each line across the files. For e.g.

File 1:

1
2

File 2:

3
4

Output would be:

2
3

The program works fine until I parallelize it and then I get different answers each run. The program looks like this:

#pragma omp parallel for        
for(int i=0; i<numFiles; i++){
    //Process FILES
    processFile(argv[i+2],alpha,entropySum,entropy2Sum);
}

Where processFiles opens the file in argv and adds the data in line i to the array entropySum[i] and data^2 to the array entropy2Sum[i]. After adding all the data I output my information using

for(int i=0; i<=TF; i++){
    double avg=0, avg2=0, stdDev=0;
    avg = entropySum[i]/numFiles;
    avg2 = entropy2Sum[i]/numFiles;
    stdDev = sqrt(avg2-pow(avg,2));
    printf("%d\t%.12e\t%.12e\n",i,avg,stdDev);
}

This is an example of the different results I am getting:

$ ./calcEntropy 1 data/2019_01_07/L06/p_00/data*
0   0.000000000000e+00  0.000000000000e+00
1   3.805323999338e-01  1.739913615303e-01
2   9.195334281358e-01  1.935111917992e-01
3   1.263129144934e+00  1.592392740809e-01
4   1.437299446640e+00  1.069415304025e-01
**5 1.519477007427e+00  7.899186640180e-02**
6   1.540955646645e+00  8.134009585552e-02
7   1.562133860024e+00  6.672275284387e-02
**8 1.573666031035e+00  7.200051995992e-02**
9   1.577305749778e+00  6.905825416624e-02
10  1.584251429260e+00  7.170811783630e-02
11  1.606099624837e+00  7.026618801497e-02
12  1.587069341648e+00  6.714269884875e-02

$ ./calcEntropy 1 data/2019_01_07/L06/p_00/data*
0   0.000000000000e+00  0.000000000000e+00
1   3.805323999338e-01  1.739913615303e-01
2   9.195334281358e-01  1.935111917992e-01
3   1.263129144934e+00  1.592392740809e-01
4   1.437299446640e+00  1.069415304025e-01
**5 1.519114903273e+00  8.255618715434e-02**
6   1.540955646645e+00  8.134009585553e-02
7   1.562133860024e+00  6.672275284389e-02
**8 1.573666031035e+00  6.715192373223e-02**
9   1.577305749778e+00  6.905825416623e-02
10  1.584251429260e+00  7.170811783631e-02
11  1.606099624837e+00  7.026618801500e-02
12  1.587069341648e+00  6.714269884876e-02

After checking things online, I feel like I should probably be using reduction? But I am not exactly sure how to implement it in this loop with two variables that are being incremented (entropySum and entropy2Sum) as well as the incrementing happening inside the function processFile.

Please let me know if you need more information or code. Thanks.

Upvotes: 0

Views: 183

Answers (1)

Nick ODell
Nick ODell

Reputation: 25269

I'm guessing the problem is that floating point math is not associative. See https://en.wikipedia.org/wiki/Associative_property#Nonassociativity_of_floating_point_calculation

In other words, if you have floats a, b, and c, then (a + b) + c does not always equal a + (b + c). The order that you do the addition matters. However, if you use #pragma omp parallel for, you're telling the compiler that order doesn't matter.

Here are some approaches to fixing this:

  1. Do the addition exactly with GMP. There's a library, GMP, which lets you convert a double into a rational number. Then, you can add the rational numbers together, and the result will be the same no matter what order it's done, because there's no rounding. See https://gmplib.org/manual/Rational-Number-Functions.html#Rational-Number-Functions

    After adding, you can convert the rational back into a double, which is not exact, but it is inexact in the same way every time.

  2. Do the addition exactly with fixed-point. This has the advantage that you don't need to use an external library.

    Convert numbers into fixed point like this:

    #define SCALE_FACTOR 100
    // Convert from double to fixed point
    // Example: 15.46   becomes 1546
    // Example: 3.14159 becomes 314
    int convert_d_to_fixed(double d) {
        return (int) d * SCALE_FACTOR
    }
    

    Converting into fixed point is inexact. However, once you've converted, all addition and subtraction in fixed point is exact, because you're using integer arithmetic.

    int add_fixed(int a, int b) {
        return a + b;
    }
    

    Once you're done adding, you can convert back:

    double convert_fixed_to_d(int f) {
        return ((double) f) / SCALE_FACTOR;
    }
    

    This method is not exact, but it is inexact in precisely the same way every time, no matter what order the addition is done.

  3. Load every number into memory, then do the addition. Create a two-dimensional array. Each column represents one file. Each row is a line from every file. In pseudocode:

    #pragma omp parallel for        
    for(int i=0; i<numFiles; i++){
        load file i into column i
    }
    #pragma omp parallel for        
    for(int j=0; j<numRows; j++){
        sum row j into entropySum[j]
        sum square of elements in row j into entropy2Sum[j]
    }
    

    This always does the additions in the same order, but still allows parallelism while loading the data and while adding the data.

    Downside: The entire dataset must be loaded into memory at once. This costs (number of files) * (number of rows) * 8 bytes

Upvotes: 2

Related Questions