charliehorse55
charliehorse55

Reputation: 1990

Finding the standard deviation of a list of deviations/elements

I have a list of sets and some basic statistics for each one (number of items, min, max, mean, stddev). I would like to calculate the same statistics for all of the sets combined. Calculating the total count, min max and mean is easy, but I'm unsure how to calculate the total standard deviation.

The data looks like this:

Count        Max      Min      Mean      Stddev
1,027,671    781      68       57.8      32.79
  839,473    552      54       61.3      48.53
3,012,102    890      41       64.9      41.92

Generating the statistics for all of the sets together:

4,879,246    890      41       62.8      ???

Upvotes: 2

Views: 848

Answers (2)

Rob Neuhaus
Rob Neuhaus

Reputation: 9290

I assume you are writing the code that maintains the distribution, and not just consuming some data that already has the standard deviation computed. The standard dev isn't a really natural parameter to maintain for a computer. Instead, You should maintain the number of items, the sum, and the sum of the items squared, and then you easily compute the mean and standard deviation the distribution from those 3 pieces of raw information. I use this strategy in this code here. The add operation supports merging two distributions. Notice how simple its implementation is. http://github.com/rrenaud/dominionstats/blob/master/stats.py#L17.

Upvotes: 3

Benjamin Bannier
Benjamin Bannier

Reputation: 58594

I think it is impossible to calculate this exactly from the data you have. The problem is that the standard deviation depends on the mean of the combined data set which isn't necessarily the same as the individual means, and also on the distances of each point from that mean to which you have no exact (but maybe approximate) access.

Upvotes: 0

Related Questions