Reputation: 1752
I have the following table (scores
):
id user date score
---|-----|------------|--------
1 | 10 | 11/01/2016 | 400
2 | 10 | 11/03/2016 | 450
5 | 17 | 10/03/2016 | 305
3 | 13 | 09/03/2016 | 120
4 | 17 | 11/03/2016 | 300
6 | 13 | 08/03/2016 | 120
7 | 13 | 11/12/2016 | 120
8 | 13 | 09/01/2016 | 110
I want to select max(score)
for each distinct user, using date
as a tie-breaker (in the event of a tie, the most recent record should be returned) such that the results look like the following (top score for each user, sorted by score
in descending order):
id user date score
---|-----|------------|--------
2 | 10 | 11/03/2016 | 450
5 | 17 | 10/03/2016 | 305
7 | 13 | 11/12/2016 | 120
I'm using Postgres and I am not a SQL expert by any means. I've tried something similar to the following, which doesn't work because I don't have the id
column included in the group by
:
select scores.user, max(scores.score) as score, scores.id
from scores
group by scores.user
order by score desc
I have a feeling I need to do a sub-select, but I can't get the join to work correctly. I found How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL? but I can't seem to make any of the solutions work for me because I need to return the row's id
and I have the possibility of a tie on the date
column.
Upvotes: 4
Views: 12186
Reputation: 29
For mysql query
select sr, id, user, date, MAX(score) score
from the_table
group by user
order by score desc;
Upvotes: 1
Reputation:
In Postgres typically the fastest method is to use distinct on ()
select distinct on (user_id) *
from the_table
order by user_id, score desc;
That is definitely a lot faster then any solution using a sub-query with max()
and usually still a bit faster then an equivalent solution using a window function (e.g. row_number()
)
I used user_id
for the column name because user
is a reserved word and I strongly recommend to not use that.
Upvotes: 5
Reputation: 1
this way
create table test (
id int,
"user" int,
"date" date,
score int
);
insert into test values
(1 , 10 , '11/01/2016' , 400 )
,(2 , 10 , '11/03/2016' , 450 )
,(5 , 17 , '10/03/2016' , 305 )
,(3 , 13 , '09/03/2016' , 120 )
,(4 , 17 , '11/03/2016' , 300 )
,(6 , 13 , '08/03/2016' , 120 )
,(7 , 13 , '11/12/2016' , 120 )
,(8 , 13 , '09/01/2016' , 110);
select * from test where id in (
select distinct(first_value(id)
over(
partition by "user" order by score desc
))
from test
)
Upvotes: 0
Reputation: 37049
Try this:
with
-- get maximum scores by user
maxscores as (
select "user", max(score) as maxscore
from test
group by "user"
),
-- find the maximum date as the tie-breaker along with the above information
maxdates as (
select t."user", mx.maxscore, max(t."date") as maxdate
from test t
inner join maxscores mx
on mx."user" = t."user"
and mx.maxscore = t.score
group by t."user", mx.maxscore
)
-- select all columns based on the results of maxdates
select t.*
from test t
inner join maxdates md
on md."user" = t."user"
and md.maxscore = t.score
and md.maxdate = t."date";
Explanation
Example:
http://sqlfiddle.com/#!15/0f756/8 - without row_number
http://sqlfiddle.com/#!15/0f756/13 - with row_number
Feel free to change the query as you desire.
Test case
create table test (
id int,
"user" int,
"date" date,
score int
);
insert into test values
(1 , 10 , '11/01/2016' , 400 )
,(2 , 10 , '11/03/2016' , 450 )
,(5 , 17 , '10/03/2016' , 305 )
,(3 , 13 , '09/03/2016' , 120 )
,(4 , 17 , '11/03/2016' , 300 )
,(6 , 13 , '08/03/2016' , 120 )
,(7 , 13 , '11/12/2016' , 120 )
,(8 , 13 , '09/01/2016' , 110);
Result
| id | user | date | score |
|----|------|----------------------------|-------|
| 2 | 10 | November, 03 2016 00:00:00 | 450 |
| 5 | 17 | October, 03 2016 00:00:00 | 305 |
| 7 | 13 | November, 12 2016 00:00:00 | 120 |
Risk
If you have two records with the same score and date for user 13 (for example), you will get 2 records user 13.
Example of the risk: http://sqlfiddle.com/#!15/cb86e/1
To mitigate the risk, you could use row_number() over()
like so:
with
rankeddata as (
select row_number() over (
partition by
"user"
order by
"user",
score desc,
"date" desc) as sr,
t.*
from test t
)
select * from rankeddata where sr = 1;
Result of mitigated risk
| sr | id | user | date | score |
|----|----|------|----------------------------|-------|
| 1 | 2 | 10 | November, 03 2016 00:00:00 | 450 |
| 1 | 7 | 13 | November, 12 2016 00:00:00 | 120 |
| 1 | 5 | 17 | October, 03 2016 00:00:00 | 305 |
Upvotes: 1