Reputation: 1752

How do I select the max score from each distinct user in this table?

I have the following table (scores):

id  user  date         score  
---|-----|------------|--------
1  | 10  | 11/01/2016 | 400   
2  | 10  | 11/03/2016 | 450 
5  | 17  | 10/03/2016 | 305  
3  | 13  | 09/03/2016 | 120   
4  | 17  | 11/03/2016 | 300 
6  | 13  | 08/03/2016 | 120  
7  | 13  | 11/12/2016 | 120  
8  | 13  | 09/01/2016 | 110

I want to select max(score) for each distinct user, using date as a tie-breaker (in the event of a tie, the most recent record should be returned) such that the results look like the following (top score for each user, sorted by score in descending order):

id  user  date         score  
---|-----|------------|--------
2  | 10  | 11/03/2016 | 450   
5  | 17  | 10/03/2016 | 305  
7  | 13  | 11/12/2016 | 120

I'm using Postgres and I am not a SQL expert by any means. I've tried something similar to the following, which doesn't work because I don't have the id column included in the group by:

select scores.user, max(scores.score) as score, scores.id
from scores
group by scores.user
order by score desc

I have a feeling I need to do a sub-select, but I can't get the join to work correctly. I found How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL? but I can't seem to make any of the solutions work for me because I need to return the row's id and I have the possibility of a tie on the date column.

Upvotes: 4

Answers (4)

Hồ Ngọc Triển

Reputation: 29

For mysql query

select sr, id, user, date, MAX(score) score
from the_table
group by user
order by score desc;

Upvotes: 1

user330315

Reputation:

In Postgres typically the fastest method is to use distinct on ()

select distinct on (user_id) *
from the_table
order by user_id, score desc;

That is definitely a lot faster then any solution using a sub-query with max() and usually still a bit faster then an equivalent solution using a window function (e.g. row_number())

^{I used user_id for the column name because user is a reserved word and I strongly recommend to not use that.}

Upvotes: 5

hongqing lv

Reputation: 1

this way

create table test (
  id int,
  "user" int,
  "date" date,
  score int
);

insert into test values 
(1  , 10  , '11/01/2016' , 400   )
,(2  , 10  , '11/03/2016' , 450 )
,(5  , 17  , '10/03/2016' , 305  )
,(3  , 13  , '09/03/2016' , 120   )
,(4  , 17  , '11/03/2016' , 300 )
,(6  , 13  , '08/03/2016' , 120  )
,(7  , 13  , '11/12/2016' , 120  )
,(8  , 13  , '09/01/2016' , 110);


select * from test where id in (
select distinct(first_value(id)
 over(
partition by "user" order by score desc 
))
from test
)

Upvotes: 0

zedfoxus

Reputation: 37049

Try this:

with 

-- get maximum scores by user
maxscores as (
  select "user", max(score) as maxscore
  from test
  group by "user"
),

-- find the maximum date as the tie-breaker along with the above information
maxdates as (
  select t."user", mx.maxscore, max(t."date") as maxdate
  from test t
  inner join maxscores mx 
    on mx."user" = t."user" 
    and mx.maxscore = t.score
  group by t."user", mx.maxscore
)

-- select all columns based on the results of maxdates
select t.*
from test t
inner join maxdates md
  on md."user" = t."user"
  and md.maxscore = t.score
  and md.maxdate = t."date";

Explanation

With CTE maxdates, let's find the maximum score by each user
Go back to the table. Get records that match the user and max score. Get maximum date for that user/score combination
Go back to the table. Get rows that match user, max score and max date we retrieved

Example:

http://sqlfiddle.com/#!15/0f756/8 - without row_number

http://sqlfiddle.com/#!15/0f756/13 - with row_number

Feel free to change the query as you desire.

Test case

create table test (
  id int,
  "user" int,
  "date" date,
  score int
);

insert into test values 
(1  , 10  , '11/01/2016' , 400   )
,(2  , 10  , '11/03/2016' , 450 )
,(5  , 17  , '10/03/2016' , 305  )
,(3  , 13  , '09/03/2016' , 120   )
,(4  , 17  , '11/03/2016' , 300 )
,(6  , 13  , '08/03/2016' , 120  )
,(7  , 13  , '11/12/2016' , 120  )
,(8  , 13  , '09/01/2016' , 110);

Result

| id | user |                       date | score |
|----|------|----------------------------|-------|
|  2 |   10 | November, 03 2016 00:00:00 |   450 |
|  5 |   17 |  October, 03 2016 00:00:00 |   305 |
|  7 |   13 | November, 12 2016 00:00:00 |   120 |

Risk

If you have two records with the same score and date for user 13 (for example), you will get 2 records user 13.

Example of the risk: http://sqlfiddle.com/#!15/cb86e/1

To mitigate the risk, you could use row_number() over() like so:

with
rankeddata as (
  select row_number() over (
    partition by
      "user"
    order by 
      "user", 
      score desc, 
      "date" desc) as sr,
    t.*
  from test t
)
select * from rankeddata where sr = 1;

Result of mitigated risk

| sr | id | user |                       date | score |
|----|----|------|----------------------------|-------|
|  1 |  2 |   10 | November, 03 2016 00:00:00 |   450 |
|  1 |  7 |   13 | November, 12 2016 00:00:00 |   120 |
|  1 |  5 |   17 |  October, 03 2016 00:00:00 |   305 |

Upvotes: 1

How do I select the max score from each distinct user in this table?

Answers (4)

Related Questions