tetsballer
tetsballer

Reputation: 11

SQL Server STDEV() Function for decimals not matching Excel STDEV() Function

Just wondering anyone knows the formula behind STDEV() (for any version of sql server), or why standard deviation for values such as these decimals are different.

My goal is to explain this difference in calculation to our QA department who can then explain to our customer.

SQL function STDEV() for decimals doesn't seem to EVER match excel, non decimal integers seem to match no problem.

I am trying to calculate a deviation for a sample, not a population.

Example values: 99.99991, 99.99992

Excel STDEV (or any online standard deviation calculator): 7.07107E-06

SQL: 7.13664510111607E-06

I have tried multiple versions of SQL server, and all variations of this function in SQL and Excel 2007 and 2019, I can never get them to match with these values.

SQL is obviously doing something slightly different compared to calculations on this website:

https://www.mathsisfun.com/data/standard-deviation-calculator.html

tSQL to reproduce:

declare @table table (theNumber float)
insert into @table (theNumber) Values (99.99991), (99.99992)
select STDEV(theNumber) from @table

Any help would be much appreciated!

Thanks!

Edit: Here is a function to use if anyone else runs into this issue:

CREATE FUNCTION[dbo].[StandardDevationSample](@Values as TVP_FLOAT READONLY) RETURNS DECIMAL(12,6)
AS

BEGIN

DECLARE @ItemCount                  INT,
        @Sum                        FLOAT,
        @Mean                       FLOAT,
        @SumOfDifferencesSquared    FLOAT,
        @Variance                   FLOAT

-------------------------------------------------------------------------------

DECLARE @Differences TABLE
(
    [Value] FLOAT
)

DECLARE @DifferencesSquared TABLE
(
    [Value] FLOAT
)

SELECT @ItemCount = (SELECT Count(ID) FROM @Values)

IF(@ItemCount <= 1)
RETURN 0

SELECT @Sum = (SELECT Sum(ID) FROM @Values)

SELECT @Mean = (@Sum / @ItemCount)

INSERT INTO @Differences ([value])
SELECT      (ID - @Mean)
FROM        @Values

INSERT INTO @DifferencesSquared ([value])
SELECT      Square([value])
FROM        @Differences

SELECT @SumOfDifferencesSquared = Sum([value])
FROM   @DifferencesSquared

SELECT @Variance = @SumOfDifferencesSquared / (@ItemCount - 1)

RETURN Convert(Decimal(18,9), Sqrt(@Variance)) 

RETURN 0

END
GO

Upvotes: 1

Views: 710

Answers (1)

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239684

If the inputs are only known to 5 decimal places, then using any decimals in the standard deviation result is questionable at best.

Consider:

declare @table table (groupNo int, theNumber float)
insert into @table (groupNo, theNumber) Values
(1, 99.9999051), (1,99.9999249),
(2, 99.99991), (2,99.99992),
(3, 99.9999149), (3,99.9999151)
select groupNo,ROUND(theNumber,5) from @table
select groupNo,STDEV(theNumber) from @table group by groupNo

The first result set is this:

groupNo (No column name)
1   99.99991
1   99.99992
2   99.99991
2   99.99992
3   99.99991
3   99.99992

The second result set is this:

groupNo (No column name)
1   1.40160927359572E-05
2   7.13664510111607E-06
3   1.9073486328125E-06

So I'd suggest the most you should present to your users for this calculation would be 7E-06, and even that is sketchy, and at which point both SQL and Excel are in agreement.

Upvotes: 0

Related Questions