Reputation: 13769
Does LINQ model the aggregate SQL function STDDEV()
(standard deviation)?
If not, what is the simplest / best-practices way to calculate it?
Example:
SELECT test_id, AVERAGE(result) avg, STDDEV(result) std
FROM tests
GROUP BY test_id
Upvotes: 98
Views: 41657
Reputation: 13266
Dynami's answer works but makes multiple passes through the data to get a result. This is a single pass method that calculates the sample standard deviation:
public static double StdDev(this IEnumerable<double> values)
{
double mean = 0.0;
double sum = 0.0;
double stdDev = 0.0;
int n = 0;
foreach (double val in values)
{
n++;
double delta = val - mean;
mean += delta / n;
sum += delta * (val - mean);
}
if (1 < n)
stdDev = Math.Sqrt(sum / (n - 1));
return stdDev;
}
This is the sample standard deviation since it divides by n - 1
. For the normal standard deviation you need to divide by n
instead.
This uses Welford's method which has higher numerical accuracy compared to the Average(x^2)-Average(x)^2
method.
Upvotes: 79
Reputation: 186748
In general case we want to compute StdDev
in one pass: what if values
is file or RDBMS cursor
which can be changed between computing average and sum? We are going to have inconsistent result. The
code below uses just one pass:
// Population StdDev
public static double StdDev(this IEnumerable<double> values) {
if (null == values)
throw new ArgumentNullException(nameof(values));
double N = 0;
double Sx = 0.0;
double Sxx = 0.0;
foreach (double x in values) {
N += 1;
Sx += x;
Sxx += x * x;
}
return N == 0
? double.NaN // or throw exception
: Math.Sqrt((Sxx - Sx * Sx / N) / N);
}
The very same idea for sample StdDev
:
// Sample StdDev
public static double StdDev(this IEnumerable<double> values) {
if (null == values)
throw new ArgumentNullException(nameof(values));
double N = 0;
double Sx = 0.0;
double Sxx = 0.0;
foreach (double x in values) {
N += 1;
Sx += x;
Sxx += x * x;
}
return N <= 1
? double.NaN // or throw exception
: Math.Sqrt((Sxx - Sx * Sx / N) / (N - 1));
}
Upvotes: 1
Reputation: 2053
Simple 4 lines, I used a List of doubles but one could use IEnumerable<int> values
public static double GetStandardDeviation(List<double> values)
{
double avg = values.Average();
double sum = values.Sum(v => (v - avg) * (v - avg));
double denominator = values.Count - 1;
return denominator > 0.0 ? Math.Sqrt(sum / denominator) : -1;
}
Upvotes: 1
Reputation: 21
Straight to the point (and C# > 6.0), Dynamis answer becomes this:
public static double StdDev(this IEnumerable<double> values)
{
var count = values?.Count() ?? 0;
if (count <= 1) return 0;
var avg = values.Average();
var sum = values.Sum(d => Math.Pow(d - avg, 2));
return Math.Sqrt(sum / count);
}
Edit 2020-08-27:
I took @David Clarke comments to make some performance tests and this are the results:
public static (double stdDev, double avg) StdDevFast(this List<double> values)
{
var count = values?.Count ?? 0;
if (count <= 1) return (0, 0);
var avg = GetAverage(values);
var sum = GetSumOfSquareDiff(values, avg);
return (Math.Sqrt(sum / count), avg);
}
private static double GetAverage(List<double> values)
{
double sum = 0.0;
for (int i = 0; i < values.Count; i++)
sum += values[i];
return sum / values.Count;
}
private static double GetSumOfSquareDiff(List<double> values, double avg)
{
double sum = 0.0;
for (int i = 0; i < values.Count; i++)
{
var diff = values[i] - avg;
sum += diff * diff;
}
return sum;
}
I tested this with a list of one million random doubles
the original implementation had an runtime of ~48ms
the performance optimized implementation 2-3ms
so this is an significant improvement.
Some interesting details:
getting rid of Math.Pow brings a boost of 33ms!
List instead of IEnumerable 6ms
manually Average calculation 4ms
For-loops instead of ForEach-loops 2ms
Array instead of List brings just an improvement of ~2% so i skipped this
using single instead of double brings nothing
Further lowering the code and using goto (yes GOTO... haven't used this since the 90s assembler...) instead of for-loops
does not pay, Thank goodness!
I have tested also parallel calculation, this makes sense on list > 200.000 items It seems that Hardware and Software needs to initialize a lot and this is for small lists contra-productive.
All tests were executed two times in a row to get rid of the warmup-time.
Upvotes: 2
Reputation: 181
public static double StdDev(this IEnumerable<int> values, bool as_sample = false)
{
var count = values.Count();
if (count > 0) // check for divide by zero
// Get the mean.
double mean = values.Sum() / count;
// Get the sum of the squares of the differences
// between the values and the mean.
var squares_query =
from int value in values
select (value - mean) * (value - mean);
double sum_of_squares = squares_query.Sum();
return Math.Sqrt(sum_of_squares / (count - (as_sample ? 1 : 0)))
}
Upvotes: 0
Reputation: 253
var stddev = Math.Sqrt(data.Average(z=>z*z)-Math.Pow(data.Average(),2));
Upvotes: 7
Reputation: 5036
You can make your own extension calculating it
public static class Extensions
{
public static double StdDev(this IEnumerable<double> values)
{
double ret = 0;
int count = values.Count();
if (count > 1)
{
//Compute the Average
double avg = values.Average();
//Perform the Sum of (value-avg)^2
double sum = values.Sum(d => (d - avg) * (d - avg));
//Put it all together
ret = Math.Sqrt(sum / count);
}
return ret;
}
}
If you have a sample of the population rather than the whole population, then you should use ret = Math.Sqrt(sum / (count - 1));
.
Transformed into extension from Adding Standard Deviation to LINQ by Chris Bennett.
Upvotes: 115
Reputation: 341
This converts David Clarke's answer into an extension that follows the same form as the other aggregate LINQ functions like Average.
Usage would be: var stdev = data.StdDev(o => o.number)
public static class Extensions
{
public static double StdDev<T>(this IEnumerable<T> list, Func<T, double> values)
{
// ref: https://stackoverflow.com/questions/2253874/linq-equivalent-for-standard-deviation
// ref: http://warrenseen.com/blog/2006/03/13/how-to-calculate-standard-deviation/
var mean = 0.0;
var sum = 0.0;
var stdDev = 0.0;
var n = 0;
foreach (var value in list.Select(values))
{
n++;
var delta = value - mean;
mean += delta / n;
sum += delta * (value - mean);
}
if (1 < n)
stdDev = Math.Sqrt(sum / (n - 1));
return stdDev;
}
}
Upvotes: 34