user2381588
user2381588

Reputation: 11

What is the purpose of boost accumulator error_of<mean>?

The documentation of the error_of< mean > feature for boost accumulators states that it calculates the error of a mean value by the formula:

sqrt(variance / (count - 1)),

where the variance is calculated by:

variance = 1/count sum[ (x_i - x_m)^2 ] where the sum goes over all values x_i i=1..count of the sample and x_m is the mean value. This gives the used formula (for the error value):

sqrt(1/ (count(count - 1)) sum[ (x_i - x_m)^2 ] ),

Wikipedia states that for the standard deviation, one use either the uncorrected or corrected sample standard deviation. The latter is calculated by:

sqrt(1/(count-1) * sum[ (x_i - x_m)^2] )

This is the one I normally use to calculate errors of mean values. So what is the purpose of error_of< mean >? And which error is calculated there?

Upvotes: 1

Views: 913

Answers (2)

TemplateRex
TemplateRex

Reputation: 70556

The overall formula of Boost.Accumulators is indeed correct, but it is computed in a somewhat non-standard fashion.

First, the sample variance is simply the average of the squared deviations

V_sample = sum[ (x_i - x_m)^2] / count
s_sample = sqrt[ V_sample ] 

but the s_sample is a biased estimater of the population standard deviation sigma. An unbiased estimator of the population standard deviation is

s_pop = s_sample * sqrt[ count / count - 1 ]

Second, the standard error on the mean is the error with which you have measured the mean. You can use the standard error on the mean to construct confidence intervals around the sample arithmetic mean as the estimator of the population mean mu.

The standard error on the mean is given as the ratio of the unbiased estimator of the population standard deviation divided by the square root of the number of observations

s_mean = s_pop / sqrt[ count ]

Boost.Accumulator computes s_mean as

s_mean = s_sample / sqrt[count - 1]

but those two expression are actually equivalent, as can be readily seen by direct substition of the relation between s_pop and s_sample.

NOTE: I think it would be useful for Boost.Accumulators to also define these two versions of the standard deviation.

Upvotes: 3

Eric Niebler
Eric Niebler

Reputation: 6167

I'm the current maintainer of Boost.Accumulators, and I wrote much of it, but not the math-y bits. I deferred all such decisions to a domain expert who worked with me closely. I put your question to him. This is the answer I got:

The standard deviation is not the error of the mean. Our equation is correct.

<shrug> It's not the most illuminating answer, but maybe it helps?

Upvotes: 2

Related Questions