Reputation: 23112
This question borders on a mathematics question but the reason I'm asking it here is because I want a solution using boost. Please let me know if you think this would be better suited to the SE Maths.
I have a sample of error values from a set of arbitrary algorithms;
std::vector<double> errors {/* some values */};
Assuming a normal distribution of the values in errors
, I need an algorithm that tells me the floating point value below which any number constitutes at least an n
-sigma event. Using the 68–95–99.7 rule, if n
were 2 then I would want to know the number below which there is at most a 5% chance of the number existing in the dataset.
double getSigmaEventValue(const std::vector<double>& container, int n);
Now, I have a suspicion that this problem is already solved for me in the boost accumulator library but I lack the mathsy know-how to figure out exactly what I'm looking for.
I know I can get the variance using boost::accumulators::variance
, but I'm not aware of any wizardry I can employ to convert a variance to an n-sigma value, so that might not be the best approach. I'm interested in using boost because I already perform a set of time-critical statistics on this dataset (median, mean, variance, min and max) so it's likely that at least some of the calculations required for this will already have been cached.
Upvotes: 0
Views: 187
Reputation: 613451
If your data is normally distributed then calculate the sample mean and sample variance. This defines is your fitted distribution. Then calculate quantiles for that distribution. For instance, this question covers that topic from the perspective of Boost: Quantile functions in boost (C++)
Of course, if your data is not normally distributed, and you apparently have no reason to believe it is, then any your proposed calculations will be meaningless.
Upvotes: 1