Reputation: 9899

How can I SELECT rows with MAX(Column value), PARTITION by another column in MYSQL?

I have a table of player performance:

CREATE TABLE TopTen (
  id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
  home INT UNSIGNED NOT NULL,
  `datetime`DATETIME NOT NULL,
  player VARCHAR(6) NOT NULL,
  resource INT NOT NULL
);

What query will return the rows for each distinct home holding its maximum value of datetime? In other words, how can I filter by the maximum datetime (grouped by home) and still include other non-grouped, non-aggregate columns (such as player) in the result?

For this sample data:

INSERT INTO TopTen
  (id, home, `datetime`, player, resource)
VALUES
  (1, 10, '04/03/2009', 'john', 399),
  (2, 11, '04/03/2009', 'juliet', 244),
  (5, 12, '04/03/2009', 'borat', 555),
  (3, 10, '03/03/2009', 'john', 300),
  (4, 11, '03/03/2009', 'juliet', 200),
  (6, 12, '03/03/2009', 'borat', 500),
  (7, 13, '24/12/2008', 'borat', 600),
  (8, 13, '01/01/2009', 'borat', 700)
;

the result should be:

id	home	datetime	player	resource
1	10	04/03/2009	john	399
2	11	04/03/2009	juliet	244
5	12	04/03/2009	borat	555
8	13	01/01/2009	borat	700

I tried a subquery getting the maximum datetime for each home:

-- 1 ..by the MySQL manual: 

SELECT DISTINCT
  home,
  id,
  datetime AS dt,
  player,
  resource
FROM TopTen t1
WHERE `datetime` = (SELECT
  MAX(t2.datetime)
FROM TopTen t2
GROUP BY home)
GROUP BY `datetime`
ORDER BY `datetime` DESC

The result-set has 130 rows although database holds 187, indicating the result includes some duplicates of home.

Then I tried joining to a subquery that gets the maximum datetime for each row id:

-- 2 ..join

SELECT
  s1.id,
  s1.home,
  s1.datetime,
  s1.player,
  s1.resource
FROM TopTen s1
JOIN (SELECT
  id,
  MAX(`datetime`) AS dt
FROM TopTen
GROUP BY id) AS s2
  ON s1.id = s2.id
ORDER BY `datetime`

Nope. Gives all the records.

I tried various exotic queries, each with various results, but nothing that got me any closer to solving this problem.

Upvotes: 919

Answers (22)

Kazi Mohammad Ali Nur Romel

Reputation: 15905

In MySQL 8.0 this can be achieved efficiently by using row_number() window function with common table expression.

(Here row_number() basically generating unique sequence for each row for every player starting with 1 in descending order of resource. So, for every player row with sequence number 1 will be with highest resource value. Now all we need to do is selecting row with sequence number 1 for each player. It can be done by writing an outer query around this query. But we used common table expression instead since it's more readable.)

Schema:

 create  TABLE TestTable(id INT, home INT, date DATETIME, 
   player VARCHAR(20), resource INT);
 INSERT INTO TestTable
 SELECT 1, 10, '2009-03-04', 'john', 399 UNION
 SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
 SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
 SELECT 3, 10, '2009-03-03', 'john', 300 UNION
 SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
 SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
 SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
 SELECT 8, 13, '2009-01-01', 'borat', 700

Query:

 with cte as 
 (
     select id, home, date , player, resource, 
     Row_Number()Over(Partition by home order by date desc) rownumber from TestTable
 )
 select id, home, date , player, resource from cte where rownumber=1

Output:

id	home	date	player	resource
1	10	2009-03-04 00:00:00	john	399
2	11	2009-03-04 00:00:00	juliet	244
5	12	2009-03-04 00:00:00	borat	555
8	13	2009-01-01 00:00:00	borat	700

db<>fiddle here

Upvotes: 6

Neon Tetra

Reputation: 159

This works in SQLServer, and is the only solution I've seen that doesn't require subqueries or CTEs - I think this is the most elegant way to solve this kind of problem.

  SELECT TOP 1 WITH TIES *
    FROM TopTen
ORDER BY ROW_NUMBER() OVER (PARTITION BY home
                                ORDER BY [datetime] DESC)

In the ORDER BY clause, it uses a window function to generate & sort by a ROW_NUMBER - assigning a 1 value to the highest [datetime] for each [home].

SELECT TOP 1 WITH TIES will then select one record with the lowest ROW_NUMBER (which will be 1), as well as all records with a tying ROW_NUMBER (also 1)

As a consequence, you retrieve all data for each of the 1st ranked records - that is, all data for records with the highest [datetime] value with their given [home] value.

Upvotes: 3

Jeka Developer

Reputation: 151

Accepted answer doesn't work for me if there are 2 records with same date and home. It will return 2 records after join. While I need to select any (random) of them. This query is used as joined subquery so just limit 1 is not possible there. Here is how I reached desired result. Don't know about performance however.

select SUBSTRING_INDEX(GROUP_CONCAT(id order by datetime desc separator ','),',',1) as id, home, MAX(datetime) as 'datetime'
 from topten
 group by (home)

Upvotes: 0

Moradnejad

Reputation: 3663

(NOTE: The answer of Michael is perfect for a situation where the target column datetime cannot have duplicate values for each distinct home.)

If your table has duplicate rows for homexdatetime and you need to only select one row for each distinct home column, here is my solution to it:

Your table needs one unique column (like id). If it doesn't, create a view and add a random column to it.

Use this query to select a single row for each unique home value. Selects the lowest id in case of duplicate datetime.

SELECT tt.*
FROM topten tt
INNER JOIN
    (
    SELECT min(id) as min_id, home from topten tt2
    INNER JOIN 
        (
        SELECT home, MAX(datetime) AS MaxDateTime
        FROM topten
        GROUP BY home) groupedtt2
    ON tt2.home = groupedtt2.home
    ) as groupedtt
ON tt.id = groupedtt.id

Upvotes: 0

kiruba

Reputation: 139

Hope below query will give the desired output:

Select id, home,datetime,player,resource, row_number() over (Partition by home ORDER by datetime desc) as rownum from tablename where rownum=1

Upvotes: 1

axiac

Reputation: 72336

The fastest MySQL solution, without inner queries and without GROUP BY:

SELECT m.*                    -- get the row that contains the max value
FROM topten m                 -- "m" from "max"
    LEFT JOIN topten b        -- "b" from "bigger"
        ON m.home = b.home    -- match "max" row with "bigger" row by `home`
        AND m.datetime < b.datetime           -- want "bigger" than "max"
WHERE b.datetime IS NULL      -- keep only if there is no bigger than max

Explanation:

Join the table with itself using the home column. The use of LEFT JOIN ensures all the rows from table m appear in the result set. Those that don't have a match in table b will have NULLs for the columns of b.

The other condition on the JOIN asks to match only the rows from b that have bigger value on the datetime column than the row from m.

Using the data posted in the question, the LEFT JOIN will produce this pairs:

+------------------------------------------+--------------------------------+
|              the row from `m`            |    the matching row from `b`   |
|------------------------------------------|--------------------------------|
| id  home  datetime     player   resource | id    home   datetime      ... |
|----|-----|------------|--------|---------|------|------|------------|-----|
| 1  | 10  | 04/03/2009 | john   | 399     | NULL | NULL | NULL       | ... | *
| 2  | 11  | 04/03/2009 | juliet | 244     | NULL | NULL | NULL       | ... | *
| 5  | 12  | 04/03/2009 | borat  | 555     | NULL | NULL | NULL       | ... | *
| 3  | 10  | 03/03/2009 | john   | 300     | 1    | 10   | 04/03/2009 | ... |
| 4  | 11  | 03/03/2009 | juliet | 200     | 2    | 11   | 04/03/2009 | ... |
| 6  | 12  | 03/03/2009 | borat  | 500     | 5    | 12   | 04/03/2009 | ... |
| 7  | 13  | 24/12/2008 | borat  | 600     | 8    | 13   | 01/01/2009 | ... |
| 8  | 13  | 01/01/2009 | borat  | 700     | NULL | NULL | NULL       | ... | *
+------------------------------------------+--------------------------------+

Finally, the WHERE clause keeps only the pairs that have NULLs in the columns of b (they are marked with * in the table above); this means, due to the second condition from the JOIN clause, the row selected from m has the biggest value in column datetime.

Read the SQL Antipatterns: Avoiding the Pitfalls of Database Programming book for other SQL tips.

Upvotes: 154

Michael La Voie

Reputation: 27926

You are so close! All you need to do is select BOTH the home and its max date time, then join back to the topten table on BOTH fields:

SELECT tt.*
FROM topten tt
INNER JOIN
    (SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home) groupedtt 
ON tt.home = groupedtt.home 
AND tt.datetime = groupedtt.MaxDateTime

Upvotes: 1108

M Khalid Junaid

Reputation: 64486

Another way to gt the most recent row per group using a sub query which basically calculates a rank for each row per group and then filter out your most recent rows as with rank = 1

select a.*
from topten a
where (
  select count(*)
  from topten b
  where a.home = b.home
  and a.`datetime` < b.`datetime`
) +1 = 1

DEMO

Here is the visual demo for rank no for each row for better understanding

By reading some comments what about if there are two rows which have same 'home' and 'datetime' field values?

Above query will fail and will return more than 1 rows for above situation. To cover up this situation there will be a need of another criteria/parameter/column to decide which row should be taken which falls in above situation. By viewing sample data set i assume there is a primary key column id which should be set to auto increment. So we can use this column to pick the most recent row by tweaking same query with the help of CASE statement like

select a.*
from topten a
where (
  select count(*)
  from topten b
  where a.home = b.home
  and  case 
       when a.`datetime` = b.`datetime`
       then a.id < b.id
       else a.`datetime` < b.`datetime`
       end
) + 1 = 1

DEMO

Above query will pick the row with highest id among the same datetime values

visual demo for rank no for each row

Upvotes: 5

Ninja kargeti

Reputation: 161

@Michae The accepted answer will working fine in most of the cases but it fail for one for as below.

In case if there were 2 rows having HomeID and Datetime same the query will return both rows, not distinct HomeID as required, for that add Distinct in query as below.

SELECT DISTINCT tt.home  , tt.MaxDateTime
FROM topten tt
INNER JOIN
    (SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home) groupedtt 
ON tt.home = groupedtt.home 
AND tt.datetime = groupedtt.MaxDateTime

Upvotes: 2

Vijunav Vastivch

Reputation: 4211

this is the query you need:

 SELECT b.id, a.home,b.[datetime],b.player,a.resource FROM
 (SELECT home,MAX(resource) AS resource FROM tbl_1 GROUP BY home) AS a

 LEFT JOIN

 (SELECT id,home,[datetime],player,resource FROM tbl_1) AS b
 ON  a.resource = b.resource WHERE a.home =b.home;

Upvotes: 1

Ricardo Felgueiras

Reputation: 3719

I think this will give you the desired result:

SELECT   home, MAX(datetime)
FROM     my_table
GROUP BY home

BUT if you need other columns as well, just make a join with the original table (check Michael La Voie answer)

Best regards.

Upvotes: 21

Roland

Reputation: 73

Why not using: SELECT home, MAX(datetime) AS MaxDateTime,player,resource FROM topten GROUP BY home Did I miss something?

Upvotes: 3

Shiva

Reputation: 651

You can also try this one and for large tables query performance will be better. It works when there no more than two records for each home and their dates are different. Better general MySQL query is one from Michael La Voie above.

SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
FROM   t_scores_1 t1 
INNER JOIN t_scores_1 t2
   ON t1.home = t2.home
WHERE t1.date > t2.date

Or in case of Postgres or those dbs that provide analytic functions try

SELECT t.* FROM 
(SELECT t1.id, t1.home, t1.date, t1.player, t1.resource
  , row_number() over (partition by t1.home order by t1.date desc) rw
 FROM   topten t1 
 INNER JOIN topten t2
   ON t1.home = t2.home
 WHERE t1.date > t2.date 
) t
WHERE t.rw = 1

Upvotes: 11

Jason Heo

Reputation: 10246

Here is MySQL version which prints only one entry where there are duplicates MAX(datetime) in a group.

You could test here http://www.sqlfiddle.com/#!2/0a4ae/1

Sample Data

mysql> SELECT * from topten;
+------+------+---------------------+--------+----------+
| id   | home | datetime            | player | resource |
+------+------+---------------------+--------+----------+
|    1 |   10 | 2009-04-03 00:00:00 | john   |      399 |
|    2 |   11 | 2009-04-03 00:00:00 | juliet |      244 |
|    3 |   10 | 2009-03-03 00:00:00 | john   |      300 |
|    4 |   11 | 2009-03-03 00:00:00 | juliet |      200 |
|    5 |   12 | 2009-04-03 00:00:00 | borat  |      555 |
|    6 |   12 | 2009-03-03 00:00:00 | borat  |      500 |
|    7 |   13 | 2008-12-24 00:00:00 | borat  |      600 |
|    8 |   13 | 2009-01-01 00:00:00 | borat  |      700 |
|    9 |   10 | 2009-04-03 00:00:00 | borat  |      700 |
|   10 |   11 | 2009-04-03 00:00:00 | borat  |      700 |
|   12 |   12 | 2009-04-03 00:00:00 | borat  |      700 |
+------+------+---------------------+--------+----------+

MySQL Version with User variable

SELECT *
FROM (
    SELECT ord.*,
        IF (@prev_home = ord.home, 0, 1) AS is_first_appear,
        @prev_home := ord.home
    FROM (
        SELECT t1.id, t1.home, t1.player, t1.resource
        FROM topten t1
        INNER JOIN (
            SELECT home, MAX(datetime) AS mx_dt
            FROM topten
            GROUP BY home
          ) x ON t1.home = x.home AND t1.datetime = x.mx_dt
        ORDER BY home
    ) ord, (SELECT @prev_home := 0, @seq := 0) init
) y
WHERE is_first_appear = 1;
+------+------+--------+----------+-----------------+------------------------+
| id   | home | player | resource | is_first_appear | @prev_home := ord.home |
+------+------+--------+----------+-----------------+------------------------+
|    9 |   10 | borat  |      700 |               1 |                     10 |
|   10 |   11 | borat  |      700 |               1 |                     11 |
|   12 |   12 | borat  |      700 |               1 |                     12 |
|    8 |   13 | borat  |      700 |               1 |                     13 |
+------+------+--------+----------+-----------------+------------------------+
4 rows in set (0.00 sec)

Accepted Answers' outout

SELECT tt.*
FROM topten tt
INNER JOIN
    (
    SELECT home, MAX(datetime) AS MaxDateTime
    FROM topten
    GROUP BY home
) groupedtt ON tt.home = groupedtt.home AND tt.datetime = groupedtt.MaxDateTime
+------+------+---------------------+--------+----------+
| id   | home | datetime            | player | resource |
+------+------+---------------------+--------+----------+
|    1 |   10 | 2009-04-03 00:00:00 | john   |      399 |
|    2 |   11 | 2009-04-03 00:00:00 | juliet |      244 |
|    5 |   12 | 2009-04-03 00:00:00 | borat  |      555 |
|    8 |   13 | 2009-01-01 00:00:00 | borat  |      700 |
|    9 |   10 | 2009-04-03 00:00:00 | borat  |      700 |
|   10 |   11 | 2009-04-03 00:00:00 | borat  |      700 |
|   12 |   12 | 2009-04-03 00:00:00 | borat  |      700 |
+------+------+---------------------+--------+----------+
7 rows in set (0.00 sec)

Upvotes: 6

SysDragon

Reputation: 9888

Try this for SQL Server:

WITH cte AS (
   SELECT home, MAX(year) AS year FROM Table1 GROUP BY home
)
SELECT * FROM Table1 a INNER JOIN cte ON a.home = cte.home AND a.year = cte.year

Upvotes: 8

Jr.

Reputation: 124

SELECT c1, c2, c3, c4, c5 FROM table1 WHERE c3 = (select max(c3) from table)

SELECT * FROM table1 WHERE c3 = (select max(c3) from table1)

Upvotes: 5

MJB

Reputation: 209

Since people seem to keep running into this thread (comment date ranges from 1.5 year) isn't this much simpler:

SELECT * FROM (SELECT * FROM topten ORDER BY datetime DESC) tmp GROUP BY home

No aggregation functions needed...

Cheers.

Upvotes: 16

Kaptah

Reputation: 9899

SELECT  tt.*
FROM    TestTable tt 
INNER JOIN 
        (
        SELECT  coord, MAX(datetime) AS MaxDateTime 
        FROM    rapsa 
        GROUP BY
                krd 
        ) groupedtt
ON      tt.coord = groupedtt.coord
        AND tt.datetime = groupedtt.MaxDateTime

Upvotes: 9

Quassnoi

Reputation: 425723

This will work even if you have two or more rows for each home with equal DATETIME's:

SELECT id, home, datetime, player, resource
FROM   (
       SELECT (
              SELECT  id
              FROM    topten ti
              WHERE   ti.home = t1.home
              ORDER BY
                      ti.datetime DESC
              LIMIT 1
              ) lid
       FROM   (
              SELECT  DISTINCT home
              FROM    topten
              ) t1
       ) ro, topten t2
WHERE  t2.id = ro.lid

Upvotes: 30

Maksym Gontar

Reputation: 22775

Here goes T-SQL version:

-- Test data
DECLARE @TestTable TABLE (id INT, home INT, date DATETIME, 
  player VARCHAR(20), resource INT)
INSERT INTO @TestTable
SELECT 1, 10, '2009-03-04', 'john', 399 UNION
SELECT 2, 11, '2009-03-04', 'juliet', 244 UNION
SELECT 5, 12, '2009-03-04', 'borat', 555 UNION
SELECT 3, 10, '2009-03-03', 'john', 300 UNION
SELECT 4, 11, '2009-03-03', 'juliet', 200 UNION
SELECT 6, 12, '2009-03-03', 'borat', 500 UNION
SELECT 7, 13, '2008-12-24', 'borat', 600 UNION
SELECT 8, 13, '2009-01-01', 'borat', 700

-- Answer
SELECT id, home, date, player, resource 
FROM (SELECT id, home, date, player, resource, 
    RANK() OVER (PARTITION BY home ORDER BY date DESC) N
    FROM @TestTable
)M WHERE N = 1

-- and if you really want only home with max date
SELECT T.id, T.home, T.date, T.player, T.resource 
    FROM @TestTable T
INNER JOIN 
(   SELECT TI.id, TI.home, TI.date, 
        RANK() OVER (PARTITION BY TI.home ORDER BY TI.date) N
    FROM @TestTable TI
    WHERE TI.date IN (SELECT MAX(TM.date) FROM @TestTable TM)
)TJ ON TJ.N = 1 AND T.id = TJ.id

EDIT
Unfortunately, there are no RANK() OVER function in MySQL.
But it can be emulated, see Emulating Analytic (AKA Ranking) Functions with MySQL.
So this is MySQL version:

SELECT id, home, date, player, resource 
FROM TestTable AS t1 
WHERE 
    (SELECT COUNT(*) 
            FROM TestTable AS t2 
            WHERE t2.home = t1.home AND t2.date > t1.date
    ) = 0

Upvotes: 81

FerranB

Reputation: 36827

This works on Oracle:

with table_max as(
  select id
       , home
       , datetime
       , player
       , resource
       , max(home) over (partition by home) maxhome
    from table  
)
select id
     , home
     , datetime
     , player
     , resource
  from table_max
 where home = maxhome

Upvotes: 8

Khb

Reputation: 1433

Try this

select * from mytable a join
(select home, max(datetime) datetime
from mytable
group by home) b
 on a.home = b.home and a.datetime = b.datetime

Regards K