Reputation: 591
Let's say I have a table
VAL PERSON
1 1
2 1
3 1
4 1
2 2
4 2
6 2
3 3
6 3
9 3
12 3
15 3
And I'd like to calculate the quartiles for each person.
I understand I can easily calculate those for a single person as such:
SELECT
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1;
Will get me the desired results:
VAL QUARTILE
1 1
2 2
3 3
4 4
Problem is, I'd like to do this for every person. I know something like this would do the job:
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 1
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 2
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 3
UNION
SELECT
PERSON,
VAL,
NTILE(4) OVER(ORDER BY VAL) AS QUARTILE
WHERE PERSON = 4
But what if there's a new person on the table? Then I'd have to change the SQL code. Any suggestions?
Upvotes: 4
Views: 17217
Reputation: 1271201
ntile()
doesn't handle ties very well. You can easily see this with an example:
select v.x, ntile(2) over (order by x) as tile
from (values (1), (1), (1), (1)) v(x);
which returns:
x tile
1 1
1 1
1 2
1 2
Same value. Different tiles. This gets worse if you are keeping track of which tile a value is in. Different rows can have different tiles on different runs of the same query -- even when the data does not change.
Normally, you would want rows with the same value to have the same quartile, even when the tiles are not the same size. For this reason, I recommend an explicit calculation using rank()
instead:
select t.*,
((seqnum - 1) * 4 / cnt) + 1 as quartile
from (select t.*,
rank() over (partition by person order by val) as seqnum,
count(*) over (partition by person) as cnt
from t
) t;
If you actually want values split among tiles, then use row_number()
rather than rank()
.
Upvotes: -1
Reputation: 805
Why don't you try to use partition by.
SELECT
PERSON,
VAL,
NTILE(4) OVER(PARTITION BY PERSON ORDER BY VAL) AS QUARTILE;
FROM TABLE
Greetings
Upvotes: 6