Reputation: 21514

Best strategy to compute average with high precision

I was comparing two algorithms computing the average of random numbers.

First algorithm sums all numbers and divides by the items count in the end
Second algorithm computes the average on every iteration and reuses the result when new data is received

I suppose there's nothing revolutionary here, and I'm not a mathematician so I can't put a name on those two algorithms.

Here is my code:

#include <iostream>
#include <iomanip>
#include <cstdlib>

class Average1
{
public:
    Average1() : total( 0 ), count( 0 ) {}

    void add( double value )
    {
        total += value;
        count++;
    }

    double average()
    {
        return total/count;
    }

private:
    double total;
    size_t count;
};

class Average2
{
public:
    Average2() : av( 0 ), count( 0 ) {}

    void add( double value )
    {
        av = (av*count + value)/(count+1);
        count++;
    }

    double average()
    {
        return av;
    }

private:
    double av;
    size_t count;
};

void compare()
{
    Average1 av1;
    Average2 av2;
    double temp;
    for ( size_t i = 0; i != 100000000; ++i )
    {
        temp = static_cast<double>(std::rand()) / static_cast<double>(RAND_MAX);
        av1.add( temp );
        av2.add( temp );
    }

    std::cout << std::setprecision(20) << av1.average() << std::endl;
    std::cout << std::setprecision(20) << av2.average() << std::endl;
}

int main()
{
    compare();
    return 0;
}

Output is:

0.50001084285722707801
0.50001084285744978875

The difference is certainly due to double type precision.

In the end, which one is the good method? Which one gives the real mathematical average (or closest to...)?

Upvotes: 4

Answers (4)

Diane M

Reputation: 1512

My own thinking is that both compute count times value which is a large number before dividing it, which explains why your result is approximate. I would do:

class Average3
{
public:
    Average3() : av( 0 ), count( 0 ) {}

    void add( double value )
    {
        count++;
        av +=  (value - av)/count;
    }
...

But you still loose precision when adding the last numbers because adding value/count is small compared to average. I'd be glad to know if my intuition was right though

Upvotes: 0

Jacob

Reputation: 1553

John D. Cook gives a great analysis he recommends:

av = av + (value - av)/count;

His posts start with Comparing three methods of computing standard deviation.

Then Theoretical explanation for numerical results

and last Accurately computing running variance

Upvotes: 3

lupod

Reputation: 154

My guess is that the first class gives a more reliable result. In the second case at each iteration you do some approximation due to the division by count and eventually all these approximations add up to the difference in result that you see. In the first case, instead, you just approximate when you compute the final division.

Upvotes: 3

sascha

Reputation: 33532

If you really want high-precision:

consider arbitrary precision arithmetic (e.g. with GMP)
consider Kahan summation algorithm (possible compiler issues)
consider Shewchuk's-algorithm (which is available in Python as math.fsum)

Edit: the python-docs in math.fsum also links to this Overview of approaches

Upvotes: 8

Best strategy to compute average with high precision

Answers (4)

Related Questions