Reputation: 11
The documentation of the error_of< mean > feature for boost accumulators states that it calculates the error of a mean value by the formula:
sqrt(variance / (count - 1)),
where the variance is calculated by:
variance = 1/count sum[ (x_i - x_m)^2 ] where the sum goes over all values x_i i=1..count of the sample and x_m is the mean value. This gives the used formula (for the error value):
sqrt(1/ (count(count - 1)) sum[ (x_i - x_m)^2 ] ),
Wikipedia states that for the standard deviation, one use either the uncorrected or corrected sample standard deviation. The latter is calculated by:
sqrt(1/(count-1) * sum[ (x_i - x_m)^2] )
This is the one I normally use to calculate errors of mean values. So what is the purpose of error_of< mean >? And which error is calculated there?
Upvotes: 1
Views: 913
Reputation: 70556
The overall formula of Boost.Accumulators is indeed correct, but it is computed in a somewhat non-standard fashion.
First, the sample variance is simply the average of the squared deviations
V_sample = sum[ (x_i - x_m)^2] / count
s_sample = sqrt[ V_sample ]
but the s_sample
is a biased estimater of the population standard deviation sigma
. An unbiased estimator of the population standard deviation is
s_pop = s_sample * sqrt[ count / count - 1 ]
Second, the standard error on the mean is the error with which you have measured the mean. You can use the standard error on the mean to construct confidence intervals around the sample arithmetic mean as the estimator of the population mean mu
.
The standard error on the mean is given as the ratio of the unbiased estimator of the population standard deviation divided by the square root of the number of observations
s_mean = s_pop / sqrt[ count ]
Boost.Accumulator computes s_mean
as
s_mean = s_sample / sqrt[count - 1]
but those two expression are actually equivalent, as can be readily seen by direct substition of the relation between s_pop
and s_sample
.
NOTE: I think it would be useful for Boost.Accumulators to also define these two versions of the standard deviation.
Upvotes: 3
Reputation: 6167
I'm the current maintainer of Boost.Accumulators, and I wrote much of it, but not the math-y bits. I deferred all such decisions to a domain expert who worked with me closely. I put your question to him. This is the answer I got:
The standard deviation is not the error of the mean. Our equation is correct.
<shrug> It's not the most illuminating answer, but maybe it helps?
Upvotes: 2