Reputation: 21514
I was comparing two algorithms computing the average of random numbers.
I suppose there's nothing revolutionary here, and I'm not a mathematician so I can't put a name on those two algorithms.
Here is my code:
#include <iostream>
#include <iomanip>
#include <cstdlib>
class Average1
{
public:
Average1() : total( 0 ), count( 0 ) {}
void add( double value )
{
total += value;
count++;
}
double average()
{
return total/count;
}
private:
double total;
size_t count;
};
class Average2
{
public:
Average2() : av( 0 ), count( 0 ) {}
void add( double value )
{
av = (av*count + value)/(count+1);
count++;
}
double average()
{
return av;
}
private:
double av;
size_t count;
};
void compare()
{
Average1 av1;
Average2 av2;
double temp;
for ( size_t i = 0; i != 100000000; ++i )
{
temp = static_cast<double>(std::rand()) / static_cast<double>(RAND_MAX);
av1.add( temp );
av2.add( temp );
}
std::cout << std::setprecision(20) << av1.average() << std::endl;
std::cout << std::setprecision(20) << av2.average() << std::endl;
}
int main()
{
compare();
return 0;
}
Output is:
0.50001084285722707801
0.50001084285744978875
The difference is certainly due to double
type precision.
In the end, which one is the good method? Which one gives the real mathematical average (or closest to...)?
Upvotes: 4
Views: 1066
Reputation: 1512
My own thinking is that both compute count times value which is a large number before dividing it, which explains why your result is approximate. I would do:
class Average3
{
public:
Average3() : av( 0 ), count( 0 ) {}
void add( double value )
{
count++;
av += (value - av)/count;
}
...
But you still loose precision when adding the last numbers because adding value/count is small compared to average. I'd be glad to know if my intuition was right though
Upvotes: 0
Reputation: 1553
John D. Cook gives a great analysis he recommends:
av = av + (value - av)/count;
His posts start with Comparing three methods of computing standard deviation.
Then Theoretical explanation for numerical results
and last Accurately computing running variance
Upvotes: 3
Reputation: 154
My guess is that the first class gives a more reliable result. In the second case at each iteration you do some approximation due to the division by count and eventually all these approximations add up to the difference in result that you see. In the first case, instead, you just approximate when you compute the final division.
Upvotes: 3
Reputation: 33532
If you really want high-precision:
Edit: the python-docs in math.fsum also links to this Overview of approaches
Upvotes: 8