Reputation: 10324
I have a table with multiple entries per person_id
column. I'm storing a score
, a category_id
, and a created
column as well. So every time the person completes the indicated category I store a single record for them.
Now I'm trying to write a query that says: Just using the most recently created score for each person, find out how many people scored worse than I did for a specific category. I'm basically doing a percentile calculation here. So to get the total number scores I'm doing:
select count(distinct person_id) from performances where category_id = 7;
I'm not sure how to write the second query though, that finds out how many people did worse than me. Is this somewhere I'd use that "OVER PARTITION" type windowing function?
Upvotes: 0
Views: 1779
Reputation: 656804
Just using the most recently created score for each person
... translates to:
SELECT DISTINCT ON (person_id) *
FROM performances
ORDER BY person_id, created DESC;
Do not add a WHERE
condition here (yet) or you get different (incorrect) results.
Details for DISTINCT ON
:
find out how many people scored worse than I did for a specific category.
... translates to:
SELECT *
, dense_rank() OVER w AS worse_than_me
, ntile(100) OVER w AS percentile
FROM (
SELECT DISTINCT ON (person_id) *
FROM performances
ORDER BY person_id, created DESC
) p
WINDOW w AS (PARTITION BY category_id ORDER BY score);
Assuming "worse" means a lower score.
The window function dense_rank()
is the right tool that answers the question "How many people?" - as opposed to rank()
which answers "How many distinct scores?".
ntile(100)
over the same window definition gives you the ready percentile as integer, 100
meaning in the top 1 %, 99
meaning in the 2nd best % etc.
However, ntile()
returns, per documentation:
integer ranging from 1 to the argument value, dividing the partition as equally as possible
That means, if you should have less than 100 rows in your partition (like you commented), multiple by 100.0 / count(*)
to scale the number. A "percentile" is not the most useful statistic for just a hand full of rows in a set, it's typically used on big sets.
But quite possibly meant to ask:
"How does each person rank in the category (s)he finished last among all other results in that category?"
Assuming unique entries for (person_id, category_id)
, or you also have to define how to deal with multiple results per person in the same category (including self).
SELECT *
FROM (
SELECT DISTINCT ON (person_id) *
FROM performances
ORDER BY person_id, created DESC
) pers
JOIN (
SELECT person_id, category_id
, dense_rank() OVER w AS worse_than_me
, ntile(100) OVER w AS percentile
FROM performances
WINDOW w AS (PARTITION BY category_id ORDER BY score)
) rnk f USING (person_id, category_id);
pers
we distill the last entry per person (the one of interest).rnk
we get ranking and percentile compared to all other entries.JOIN
with the USING
clause, and you got a ready SELECT
list without duplicate columns.but would also make more sense if there can be multiple entries per (person_id, category_id)
:
"Get the rank for the latest score of each person in each category compared to all other latest personal scores in the same category."
SELECT *
, dense_rank() OVER w AS worse_than_me
, ntile(100) OVER w AS percentile
FROM (
SELECT DISTINCT ON (person_id, category_id) *
FROM performances
ORDER BY person_id, category_id, created DESC;
) p
WINDOW w AS (PARTITION BY category_id ORDER BY score);
Unclear / ambiguous questions lead to arbitrary results. The first step to a solution is to define the task clearly.
Upvotes: 3
Reputation: 125254
select
person_id,
count(*) over() as total_person,
rank() over(order by score desc) as score_rank
from (
select distinct on (person_id) *
from score
where category_id = 7
order by person_id, created desc
) s
Check rank
, dense_rank
, percent_rank
, ntile
, and cume_dist
:
http://www.postgresql.org/docs/current/static/functions-window.html
distinct on
returns a single row from each of the person_id
s. Using the order by
clause it is possible to choose each one.
Upvotes: 1