VAAA
VAAA

Reputation: 15039

SQL - Statistics normal distribution amount range

I have a SQL Server Database table with about 100K records of invoices (date, provider, type and amount).

This is sample data:

enter image description here

I want to group my data into 4 segments depending on the amount.

  1. Group 1: < X1
  2. Group 2: Between X1 & X2
  3. Group 3: Between X2 & X3
  4. Group 4: > X3

Where X1, X2 and X3 are amounts, but these amounts should be calculated statistically based on normal distribution so they make sense.

So I guess I have to get my AVG(amount) and my STDEV(amount) in order to know what my X1, X2 and X3 values should be acceptable.

Any clue on how to do this so the values makes sense for a report so I can generate a chart like this:

enter image description here

Upvotes: 1

Views: 1812

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521914

SELECT CASE WHEN Amount < (SELECT AVG(Amount) FROM yourTable) -
                          (SELECT STDEVP(Amount) FROM yourTable)
            THEN 'Group 1'
            WHEN Amount < (SELECT AVG(Amount) FROM yourTable) THEN 'Group 2'
            WHEN Amount < (SELECT AVG(Amount) FROM yourTable) +
                          (SELECT STDEVP(Amount) FROM yourTable)
            THEN 'Group 3'
            ELSE 'Group 4'
       END
FROM yourTable

Explanation:

I assume that your boundaries correspond to the following:

X1 - One standard deviation below the mean Amount
X2 - The mean value of the Amount
X3 - One standard deviation about the mean Amount

Hence I assume that your X markers define the quartiles of the Amount distribution. Subqueries can be used for both the mean and standard deviation, and the AVG and STDEVP functions can be used for this.

Upvotes: 1

Related Questions