jpo38
jpo38

Reputation: 21514

Best strategy to compute average with high precision

I was comparing two algorithms computing the average of random numbers.

I suppose there's nothing revolutionary here, and I'm not a mathematician so I can't put a name on those two algorithms.

Here is my code:

#include <iostream>
#include <iomanip>
#include <cstdlib>

class Average1
{
public:
    Average1() : total( 0 ), count( 0 ) {}

    void add( double value )
    {
        total += value;
        count++;
    }

    double average()
    {
        return total/count;
    }

private:
    double total;
    size_t count;
};

class Average2
{
public:
    Average2() : av( 0 ), count( 0 ) {}

    void add( double value )
    {
        av = (av*count + value)/(count+1);
        count++;
    }

    double average()
    {
        return av;
    }

private:
    double av;
    size_t count;
};

void compare()
{
    Average1 av1;
    Average2 av2;
    double temp;
    for ( size_t i = 0; i != 100000000; ++i )
    {
        temp = static_cast<double>(std::rand()) / static_cast<double>(RAND_MAX);
        av1.add( temp );
        av2.add( temp );
    }

    std::cout << std::setprecision(20) << av1.average() << std::endl;
    std::cout << std::setprecision(20) << av2.average() << std::endl;
}

int main()
{
    compare();
    return 0;
}

Output is:

0.50001084285722707801
0.50001084285744978875

The difference is certainly due to double type precision.

In the end, which one is the good method? Which one gives the real mathematical average (or closest to...)?

Upvotes: 4

Views: 1066

Answers (4)

Diane M
Diane M

Reputation: 1512

My own thinking is that both compute count times value which is a large number before dividing it, which explains why your result is approximate. I would do:

class Average3
{
public:
    Average3() : av( 0 ), count( 0 ) {}

    void add( double value )
    {
        count++;
        av +=  (value - av)/count;
    }
...

But you still loose precision when adding the last numbers because adding value/count is small compared to average. I'd be glad to know if my intuition was right though

Upvotes: 0

Jacob
Jacob

Reputation: 1553

John D. Cook gives a great analysis he recommends:

av = av + (value - av)/count;

His posts start with Comparing three methods of computing standard deviation.

Then Theoretical explanation for numerical results

and last Accurately computing running variance

Upvotes: 3

lupod
lupod

Reputation: 154

My guess is that the first class gives a more reliable result. In the second case at each iteration you do some approximation due to the division by count and eventually all these approximations add up to the difference in result that you see. In the first case, instead, you just approximate when you compute the final division.

Upvotes: 3

sascha
sascha

Reputation: 33532

If you really want high-precision:

Edit: the python-docs in math.fsum also links to this Overview of approaches

Upvotes: 8

Related Questions