Reputation: 255
Here is the situation, I really don't know what is really going on when adding big number to each other to calculate all the average at the end.
Please correct me if there is a specific error to edit.
I have debugged and I'm just finding in data, my normal data in the following loop but it seems that the variable "somme"gives me sometines some random numbers and gives something totally wrong. The same for "moyenne"
Something else, all data are, or 0 or a positive number. Somme gives sometimes a negative number!
#define Nb 230400
std::vector<std::array<double,480>> data(480);
double somme=0;
double moyenne=0;
for (int i=0;i<480;i++)
{
for (int j=0;j<480;j++)
somme=somme+data[i][j];
}
moyenne=somme/Nb;
Upvotes: 0
Views: 1140
Reputation: 153977
First, with the code you've given, there is no way you can get
negative results (at least with the IEEE floating point used on
PCs and the usual Unix machines); if you overflow, you will get
the special value Inf
(but you cannot overflow if the data are
in the ranges you specify). The results may be wrong, due to
rounding errors, but they will still have a lower bound of 0.
You haven't specified how you determined that the results were negative, nor how you ensure that the input data is in range, so I can only speculate; but the following are distinct possibilities:
for ( int i = 0; i < 480; ++ i ) { for ( int j = 0; j < 480; ++ j ) { assert( data[i][j] >= 0.0 && data[i][j] < 200000.0 ); somme += data[i][j]; } }
For the rest, your algorithm isn't particularly accurate. Some
quick tests (filling your data structure with random values in
the range [0...2e5)
) show less than 15 digits accuracy in the
final result. (Of course, this may be acceptable. Most
physical data that you acquire won't have more than 3 or
4 digits accuracy anyway, and you may not be displaying more
than 6. In which case...)
The accuracy issue is actually curious, and shows just how tricky these things can be. I used three functions for my tests:
// Basically what you did...
double
av1( std::vector<std::array<double, cols>> const& data )
{
double somme = 0.0;
for ( int i = 0; i != data.size(); ++ i ) {
for ( int j = 0; j != cols; ++j ) {
somme += data[i][j];
}
}
return somme / (data.size() * cols);
}
// The natural way of writing it in C++11...
double
av2( std::vector<std::array<double, cols>> const& data )
{
return std::accumulate(
data.begin(),
data.end(),
0.0,
[]( double a, std::array<double, cols> const& b ) {
return a + std::accumulate( b.begin(), b.end(), 0.0 );
} ) / (data.size() * cols);
}
// Using the Kahan summation algorithm...
double
av3( std::vector<std::array<double, cols>> const& data )
{
double somme = 0.0;
double c = 0.0;
for ( int i = 0; i != data.size(); ++ i ) {
for ( int j = 0; j != cols; ++j ) {
double y = data[i][j] - c;
double t = somme + y;
c = (t - somme) - y;
somme = t;
}
}
return somme / (data.size() * cols);
}
(In all of the tests, cols == 480
and data.size() == 480
.)
The code was compiled using VC11, with option /O2. The
interesting thing was that av2
was systematically more
accurate than your code, usually down to the 17th digit (2 or
3 bits in the internal representation), where as
av1
was often off as much as 2 or 3 in the 15th digit (8 or
9 bits in the internal representation). The reason for this is
that your code systematically collects into xmm1
, accross all
480*480
values, where as av2
collects each row separately;
this results in less additions with a large difference of
magnitude. (As av1
approaches the end of the data, somme
approaches 2.3e10
, which is significantly larger than any of
the data elements.) Using something like:
double
moyenne( std::vector<std::array<double, cols>> const& data )
{
double outerSum = 0.0;
for ( int i = 0; i != data.size(); ++ i ) {
double innerSum = 0.0;
for ( int j = 0; j != cols; ++ j ) {
innerSum += data[i][j];
}
outerSum += innerSum;
}
return outerSum / (data.size() * cols);
}
should give results equivalent to av2
. (But if you need the
accuracy, you really should go with the Kahan summing
algorithm.)
(I'm tempted to add that if any of this surprises you, you shouldn't be using floating point anyway.)
Upvotes: 2
Reputation: 2909
This may also be caused by floating point errors. The floating point error can be really large if you add numbers in different dimensions (such as 10e-10 + 10
) while the error is smaller if the dimensions are similar.
If all numbers are large, your code should work (if there is no overflow). If not, you may increase accuracy if you add the numbers sorted. Pseudocode:
array a;
sort(a);
foreach i in a:
somme += i
somme /= count(a)
The reason is, that the smallest numbers summed up may be of a more similar dimension like the next bigger number. This way the error gets smaller.
To avoid overflows you may devide each i by count(a) instead of deviding the result. This should not change the accuracy if no overflow happens.
PS: If you sort the array descending or reverse the loop you can maximize your error!
Upvotes: 0
Reputation: 608
It's possible that a data overflow happened. The overflow changed the sign bit so it looks like a negative number. Try "long double" instead of "double" if you're dealing with really big numbers.
Upvotes: 1