user2719094
user2719094

Reputation: 1752

How do I select the max score from each distinct user in this table?

I have the following table (scores):

id  user  date         score  
---|-----|------------|--------
1  | 10  | 11/01/2016 | 400   
2  | 10  | 11/03/2016 | 450 
5  | 17  | 10/03/2016 | 305  
3  | 13  | 09/03/2016 | 120   
4  | 17  | 11/03/2016 | 300 
6  | 13  | 08/03/2016 | 120  
7  | 13  | 11/12/2016 | 120  
8  | 13  | 09/01/2016 | 110

I want to select max(score) for each distinct user, using date as a tie-breaker (in the event of a tie, the most recent record should be returned) such that the results look like the following (top score for each user, sorted by score in descending order):

id  user  date         score  
---|-----|------------|--------
2  | 10  | 11/03/2016 | 450   
5  | 17  | 10/03/2016 | 305  
7  | 13  | 11/12/2016 | 120 

I'm using Postgres and I am not a SQL expert by any means. I've tried something similar to the following, which doesn't work because I don't have the id column included in the group by:

select scores.user, max(scores.score) as score, scores.id
from scores
group by scores.user
order by score desc

I have a feeling I need to do a sub-select, but I can't get the join to work correctly. I found How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL? but I can't seem to make any of the solutions work for me because I need to return the row's id and I have the possibility of a tie on the date column.

Upvotes: 4

Views: 12186

Answers (4)

Hồ Ngọc Triển
Hồ Ngọc Triển

Reputation: 29

For mysql query

select sr, id, user, date, MAX(score) score
from the_table
group by user
order by score desc;

Upvotes: 1

user330315
user330315

Reputation:

In Postgres typically the fastest method is to use distinct on ()

select distinct on (user_id) *
from the_table
order by user_id, score desc;

That is definitely a lot faster then any solution using a sub-query with max() and usually still a bit faster then an equivalent solution using a window function (e.g. row_number())


I used user_id for the column name because user is a reserved word and I strongly recommend to not use that.

Upvotes: 5

hongqing lv
hongqing lv

Reputation: 1

this way

create table test (
  id int,
  "user" int,
  "date" date,
  score int
);

insert into test values 
(1  , 10  , '11/01/2016' , 400   )
,(2  , 10  , '11/03/2016' , 450 )
,(5  , 17  , '10/03/2016' , 305  )
,(3  , 13  , '09/03/2016' , 120   )
,(4  , 17  , '11/03/2016' , 300 )
,(6  , 13  , '08/03/2016' , 120  )
,(7  , 13  , '11/12/2016' , 120  )
,(8  , 13  , '09/01/2016' , 110);


select * from test where id in (
select distinct(first_value(id)
 over(
partition by "user" order by score desc 
))
from test
)

Upvotes: 0

zedfoxus
zedfoxus

Reputation: 37049

Try this:

with 

-- get maximum scores by user
maxscores as (
  select "user", max(score) as maxscore
  from test
  group by "user"
),

-- find the maximum date as the tie-breaker along with the above information
maxdates as (
  select t."user", mx.maxscore, max(t."date") as maxdate
  from test t
  inner join maxscores mx 
    on mx."user" = t."user" 
    and mx.maxscore = t.score
  group by t."user", mx.maxscore
)

-- select all columns based on the results of maxdates
select t.*
from test t
inner join maxdates md
  on md."user" = t."user"
  and md.maxscore = t.score
  and md.maxdate = t."date";

Explanation

  • With CTE maxdates, let's find the maximum score by each user
  • Go back to the table. Get records that match the user and max score. Get maximum date for that user/score combination
  • Go back to the table. Get rows that match user, max score and max date we retrieved

Example:

http://sqlfiddle.com/#!15/0f756/8 - without row_number

http://sqlfiddle.com/#!15/0f756/13 - with row_number

Feel free to change the query as you desire.

Test case

create table test (
  id int,
  "user" int,
  "date" date,
  score int
);

insert into test values 
(1  , 10  , '11/01/2016' , 400   )
,(2  , 10  , '11/03/2016' , 450 )
,(5  , 17  , '10/03/2016' , 305  )
,(3  , 13  , '09/03/2016' , 120   )
,(4  , 17  , '11/03/2016' , 300 )
,(6  , 13  , '08/03/2016' , 120  )
,(7  , 13  , '11/12/2016' , 120  )
,(8  , 13  , '09/01/2016' , 110);

Result

| id | user |                       date | score |
|----|------|----------------------------|-------|
|  2 |   10 | November, 03 2016 00:00:00 |   450 |
|  5 |   17 |  October, 03 2016 00:00:00 |   305 |
|  7 |   13 | November, 12 2016 00:00:00 |   120 |

Risk

If you have two records with the same score and date for user 13 (for example), you will get 2 records user 13.

Example of the risk: http://sqlfiddle.com/#!15/cb86e/1

To mitigate the risk, you could use row_number() over() like so:

with
rankeddata as (
  select row_number() over (
    partition by
      "user"
    order by 
      "user", 
      score desc, 
      "date" desc) as sr,
    t.*
  from test t
)
select * from rankeddata where sr = 1;

Result of mitigated risk

| sr | id | user |                       date | score |
|----|----|------|----------------------------|-------|
|  1 |  2 |   10 | November, 03 2016 00:00:00 |   450 |
|  1 |  7 |   13 | November, 12 2016 00:00:00 |   120 |
|  1 |  5 |   17 |  October, 03 2016 00:00:00 |   305 |

Upvotes: 1

Related Questions