Standard deviation using LINQ gives different answer from iterative calculations

If I do a standard deviation calculation for a sample using this code modified somewhat from this SO question:

public double CalculateStandardDeviation(List<double> values, bool sample = false)
    {
        double mean = 0.0;
        double sum = 0.0;
        double stdDev = 0.0;
        int count = 0;
        foreach (double val in values)
        {
            count++;
            double delta = val - mean;
            mean += delta / count;
            sum += delta * (val - mean);
        }
        if (1 < count)
            stdDev = Math.Sqrt(sum / (count - (sample ? 1 : 0)));
        return stdDev;
    }

Using this unit test:

    [Test]
    public void Sample_Standard_Deviation_Returns_Expected_Value()
    {
        //original cite: http://warrenseen.com/blog/2006/03/13/how-to-calculate-standard-deviation/
        double expected = 2.23606797749979;
        double tolerance = 1.0 / System.Math.Pow(10, 13);
        var cm = new CommonMath();//a library of math functions we use a lot
        List<double> values = new List<double> { 4.0, 2.0, 5.0, 8.0, 6.0 };
        double actual = cm.CalculateStandardDeviation(values, true);
        Assert.That(actual, Is.EqualTo(expected).Within(tolerance));
    }

The test passes with a resultant value within the specified tolerance.

However, if I use this Linq-ified code, it fails, returning a value of 2.5 (as if it were a population standard deviation instead):

        double meanOfValues = values.Average();
        double sumOfValues = values.Sum();
        int countOfValues = values.Count;
        double standardDeviationOfValues = 
            Math.Sqrt(sumOfValues / (countOfValues - (sample ? 1 : 0)));

        return standardDeviationOfValues;

As I've never taken statistics (so please be gentle), the Linq-ification (that's a word) of the values from the list seem like they should give me the same results, but they don't and I don't understand what I've done wrong. The action of deciding between N & N-1 is the same in both, so why isn't the answer the same?

Upvotes: 0

Answers (3)

NetMage

Reputation: 26917

Your LINQ version does not compute Standard Deviation. Standard Deviation is based on the sum of the square of the differences from the mean, so change to:

double meanOfValues = values.Average();
double sumOfValues = values.Select(v => (v-meanOfValues)*(v-meanOfValues)).Sum();
int countOfValues = values.Count;
double standardDeviationOfValues =
    Math.Sqrt(sumOfValues / (countOfValues - (sample ? 1 : 0)));

return standardDeviationOfValues;

To traverse values one time, you can use Aggregate but it isn't better than a normal function:

var g = values.Aggregate(new { mean = 0.0, sum = 0.0, count = 0 },
            (acc, val) => {
                var newcount = acc.count+1;
                double delta = val-acc.mean;
                var newmean = acc.mean + delta / newcount;
                return new { mean = newmean, sum = acc.sum+delta*(val-newmean), count = newcount };
         });
var stdDev = Math.Sqrt(g.sum / (g.count - (sample ? 1 : 0)));

Upvotes: 1