Finding the standard deviation of a list of deviations/elements

Question

I have a list of sets and some basic statistics for each one (number of items, min, max, mean, stddev). I would like to calculate the same statistics for all of the sets combined. Calculating the total count, min max and mean is easy, but I'm unsure how to calculate the total standard deviation.

The data looks like this:

Count        Max      Min      Mean      Stddev
1,027,671    781      68       57.8      32.79
  839,473    552      54       61.3      48.53
3,012,102    890      41       64.9      41.92

Generating the statistics for all of the sets together:

4,879,246    890      41       62.8      ???

Rob Neuhaus · Accepted Answer

I assume you are writing the code that maintains the distribution, and not just consuming some data that already has the standard deviation computed. The standard dev isn't a really natural parameter to maintain for a computer. Instead, You should maintain the number of items, the sum, and the sum of the items squared, and then you easily compute the mean and standard deviation the distribution from those 3 pieces of raw information. I use this strategy in this code here. The add operation supports merging two distributions. Notice how simple its implementation is. http://github.com/rrenaud/dominionstats/blob/master/stats.py#L17.

Finding the standard deviation of a list of deviations/elements

Answers (2)

Related Questions