MelMed
MelMed

Reputation: 255

Summing huge numbers

Here is the situation, I really don't know what is really going on when adding big number to each other to calculate all the average at the end.

Please correct me if there is a specific error to edit.

I have debugged and I'm just finding in data, my normal data in the following loop but it seems that the variable "somme"gives me sometines some random numbers and gives something totally wrong. The same for "moyenne"

Something else, all data are, or 0 or a positive number. Somme gives sometimes a negative number!

#define Nb 230400
std::vector<std::array<double,480>> data(480);

    double somme=0;
    double moyenne=0;
    for (int i=0;i<480;i++)
    {
        for (int j=0;j<480;j++)
            somme=somme+data[i][j];

    }
    moyenne=somme/Nb;

Upvotes: 0

Views: 1140

Answers (3)

James Kanze
James Kanze

Reputation: 153977

First, with the code you've given, there is no way you can get negative results (at least with the IEEE floating point used on PCs and the usual Unix machines); if you overflow, you will get the special value Inf (but you cannot overflow if the data are in the ranges you specify). The results may be wrong, due to rounding errors, but they will still have a lower bound of 0.

You haven't specified how you determined that the results were negative, nor how you ensure that the input data is in range, so I can only speculate; but the following are distinct possibilities:

  • You compiled with optimization turned on, and you are looking at the values with the debugger. The debugger will often show wrong values (uninitialized memory) when looking at optimized code.
  • You have undefined behavior (a pointer problem) elsewhere, which corrupts the memory you're looking at here. 99% of the time, this is the explination of otherwise unexplainable behavior, but I'm somewhat dubious here: provided there is nothing else in the code sequence you posted, and there are no other threads running, there are no pointers (at least that you manipulate) to misuse.
  • You failed to properly initialize the data. You might want to add an assert in the innermost loop, just to be sure:
        for ( int i = 0; i < 480; ++ i ) {
            for ( int j = 0; j < 480; ++ j ) {
                assert( data[i][j] >= 0.0 && data[i][j] < 200000.0 );
                somme += data[i][j];
            }
        }
    

For the rest, your algorithm isn't particularly accurate. Some quick tests (filling your data structure with random values in the range [0...2e5)) show less than 15 digits accuracy in the final result. (Of course, this may be acceptable. Most physical data that you acquire won't have more than 3 or 4 digits accuracy anyway, and you may not be displaying more than 6. In which case...)

The accuracy issue is actually curious, and shows just how tricky these things can be. I used three functions for my tests:

//  Basically what you did...
double
av1( std::vector<std::array<double, cols>> const& data )
{
    double somme = 0.0;
    for ( int i = 0; i != data.size(); ++ i ) {
        for ( int j = 0; j != cols; ++j ) {
            somme += data[i][j];
        }
    }
    return somme / (data.size() * cols);
}

//  The natural way of writing it in C++11...
double
av2( std::vector<std::array<double, cols>> const& data )
{
    return std::accumulate( 
        data.begin(),
        data.end(),
        0.0,
        []( double a, std::array<double, cols> const& b ) {
            return a + std::accumulate( b.begin(), b.end(), 0.0 );
        } ) / (data.size() * cols);
}

//  Using the Kahan summation algorithm...
double
av3( std::vector<std::array<double, cols>> const& data )
{
    double somme = 0.0;
    double c = 0.0;
    for ( int i = 0; i != data.size(); ++ i ) {
        for ( int j = 0; j != cols; ++j ) {
            double y = data[i][j] - c;
            double t = somme + y;
            c = (t - somme) - y;
            somme = t;
        }
    }
    return somme / (data.size() * cols);
}

(In all of the tests, cols == 480 and data.size() == 480.)

The code was compiled using VC11, with option /O2. The interesting thing was that av2 was systematically more accurate than your code, usually down to the 17th digit (2 or 3 bits in the internal representation), where as av1 was often off as much as 2 or 3 in the 15th digit (8 or 9 bits in the internal representation). The reason for this is that your code systematically collects into xmm1, accross all 480*480 values, where as av2 collects each row separately; this results in less additions with a large difference of magnitude. (As av1 approaches the end of the data, somme approaches 2.3e10, which is significantly larger than any of the data elements.) Using something like:

double
moyenne( std::vector<std::array<double, cols>> const& data )
{
    double outerSum = 0.0;
    for ( int i = 0; i != data.size(); ++ i ) {
        double innerSum = 0.0;
        for ( int j = 0; j != cols; ++ j ) {
            innerSum += data[i][j];
        }
        outerSum += innerSum;
    }
    return outerSum / (data.size() * cols);
}

should give results equivalent to av2. (But if you need the accuracy, you really should go with the Kahan summing algorithm.)

(I'm tempted to add that if any of this surprises you, you shouldn't be using floating point anyway.)

Upvotes: 2

urzeit
urzeit

Reputation: 2909

This may also be caused by floating point errors. The floating point error can be really large if you add numbers in different dimensions (such as 10e-10 + 10) while the error is smaller if the dimensions are similar.

If all numbers are large, your code should work (if there is no overflow). If not, you may increase accuracy if you add the numbers sorted. Pseudocode:

array a;
sort(a);
foreach i in a:
    somme += i
somme /= count(a)

The reason is, that the smallest numbers summed up may be of a more similar dimension like the next bigger number. This way the error gets smaller.

To avoid overflows you may devide each i by count(a) instead of deviding the result. This should not change the accuracy if no overflow happens.

PS: If you sort the array descending or reverse the loop you can maximize your error!

Upvotes: 0

Manas
Manas

Reputation: 608

It's possible that a data overflow happened. The overflow changed the sign bit so it looks like a negative number. Try "long double" instead of "double" if you're dealing with really big numbers.

Upvotes: 1

Related Questions