Reputation: 183559
How do you get the rows that contain the max value for each grouped set?
I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:
Given a table like that below, with person, group, and age columns, how would you get the oldest person in each group? (A tie within a group should give the first alphabetical result)
Person | Group | Age
---
Bob | 1 | 32
Jill | 1 | 34
Shawn| 1 | 42
Jake | 2 | 29
Paul | 2 | 36
Laura| 2 | 39
Desired result set:
Shawn | 1 | 42
Laura | 2 | 39
Upvotes: 363
Views: 406406
Reputation: 455
I retired in 2014 and my memory of Microsoft SQL is a bit rusty. I found this question in looking for a solution to a challenge in mysql. The answer that was said to be portable, when I changed to my table & column names got mysql to complain it wasn't valid with no further explanation. Several variations got the same unhelpful complaint. Finally, I decided to try my own much simpler instinct, and it worked. To get the most recent change for each xref from a list of changes, I use
SELECT xref, MAX(change_time) AS last_change
FROM ug_change
GROUP BY xref;
Upvotes: 0
Reputation: 72226
The correct solution is:
SELECT o.*
FROM `Persons` o # 'o' from 'oldest person in group'
LEFT JOIN `Persons` b # 'b' from 'bigger age'
ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL # bigger age not found
It matches each row from o
with all the rows from b
having the same value in column Group
and a bigger value in column Age
. Any row from o
not having the maximum value of its group in column Age
will match one or more rows from b
.
The LEFT JOIN
makes it match the oldest person in group (including the persons that are alone in their group) with a row full of NULL
s from b
('no biggest age in the group').
Using INNER JOIN
makes these rows not matching and they are ignored.
The WHERE
clause keeps only the rows having NULL
s in the fields extracted from b
. They are the oldest persons from each group.
This solution and many others are explained in the book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming
Upvotes: 437
Reputation: 1771
In PostgreSQL you can use DISTINCT ON clause:
SELECT DISTINCT ON ("group") * FROM "mytable" ORDER BY "group", "age" DESC;
Upvotes: 36
Reputation: 446
SELECT o.*
FROM `Persons` o
LEFT JOIN `Persons` b
ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL
group by o.Group
Upvotes: 0
Reputation: 504
Improving on axiac's solution to avoid selecting multiple rows per group while also allowing for use of indexes
SELECT o.*
FROM `Persons` o
LEFT JOIN `Persons` b
ON o.Group = b.Group AND o.Age < b.Age
LEFT JOIN `Persons` c
ON o.Group = c.Group AND o.Age = c.Age and o.id < c.id
WHERE b.Age is NULL and c.id is null
Upvotes: 7
Reputation: 4957
Using ranking method.
SELECT @rn := CASE WHEN @prev_grp <> groupa THEN 1 ELSE @rn+1 END AS rn,
@prev_grp :=groupa,
person,age,groupa
FROM users,(SELECT @rn := 0) r
HAVING rn=1
ORDER BY groupa,age DESC,person
This sql can be explained as below,
select * from users, (select @rn := 0) r order by groupa, age desc, person
@prev_grp is null
@rn := CASE WHEN @prev_grp <> groupa THEN 1 ELSE @rn+1 END
this is a three operator expression
like this, rn = 1 if prev_grp != groupa else rn=rn+1
having rn=1 filter out the row you need
Upvotes: 4
Reputation: 139
In Oracle below query can give the desired result.
SELECT group,person,Age,
ROWNUMBER() OVER (PARTITION BY group ORDER BY age desc ,person asc) as rankForEachGroup
FROM tablename where rankForEachGroup=1
Upvotes: 2
Reputation: 270617
You can join against a subquery that pulls the MAX(Group)
and Age
. This method is portable across most RDBMS.
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT `Group`, MAX(Age) AS max_age
FROM yourTable
GROUP BY `Group`
) t2
ON t1.`Group` = t2.`Group` AND t1.Age = t2.max_age;
Upvotes: 88
Reputation: 425033
There's a super-simple way to do this in mysql:
select *
from (select * from mytable order by `Group`, age desc, Person) x
group by `Group`
This works because in mysql you're allowed to not aggregate non-group-by columns, in which case mysql just returns the first row. The solution is to first order the data such that for each group the row you want is first, then group by the columns you want the value for.
You avoid complicated subqueries that try to find the max()
etc, and also the problems of returning multiple rows when there are more than one with the same maximum value (as the other answers would do)
Note: This is a mysql-only solution. All other databases I know will throw an SQL syntax error with the message "non aggregated columns are not listed in the group by clause" or similar. Because this solution uses undocumented behavior, the more cautious may want to include a test to assert that it remains working should a future version of MySQL change this behavior.
Since version 5.7, the sql-mode
setting includes ONLY_FULL_GROUP_BY
by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).
Upvotes: 146
Reputation: 442
This is how I'm getting the N max rows per group in mysql
SELECT co.id, co.person, co.country
FROM person co
WHERE (
SELECT COUNT(*)
FROM person ci
WHERE co.country = ci.country AND co.id < ci.id
) < 1
;
how it works:
co.country = ci.country
) < 1
so for 3 elements - ) < 3co.id < ci.id
Full example here:
mysql select n max values per group
Upvotes: 1
Reputation: 17
If ID(and all coulmns) is needed from mytable
SELECT
*
FROM
mytable
WHERE
id NOT IN (
SELECT
A.id
FROM
mytable AS A
JOIN mytable AS B ON A. GROUP = B. GROUP
AND A.age < B.age
)
Upvotes: 0
Reputation: 41
My solution works only if you need retrieve only one column, however for my needs was the best solution found in terms of performance (it use only one single query!):
SELECT SUBSTRING_INDEX(GROUP_CONCAT(column_x ORDER BY column_y),',',1) AS xyz,
column_z
FROM table_name
GROUP BY column_z;
It use GROUP_CONCAT in order to create an ordered concat list and then I substring to only the first one.
Upvotes: 3
Reputation: 1274
axiac's solution is what worked best for me in the end. I had an additional complexity however: a calculated "max value", derived from two columns.
Let's use the same example: I would like the oldest person in each group. If there are people that are equally old, take the tallest person.
I had to perform the left join two times to get this behavior:
SELECT o1.* WHERE
(SELECT o.*
FROM `Persons` o
LEFT JOIN `Persons` b
ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL) o1
LEFT JOIN
(SELECT o.*
FROM `Persons` o
LEFT JOIN `Persons` b
ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL) o2
ON o1.Group = o2.Group AND o1.Height < o2.Height
WHERE o2.Height is NULL;
Hope this helps! I guess there should be better way to do this though...
Upvotes: 2
Reputation: 1
let the table name be people
select O.* -- > O for oldest table
from people O , people T
where O.grp = T.grp and
O.Age =
(select max(T.age) from people T where O.grp = T.grp
group by T.grp)
group by O.grp;
Upvotes: 0
Reputation: 1361
Not sure if MySQL has row_number function. If so you can use it to get the desired result. On SQL Server you can do something similar to:
CREATE TABLE p
(
person NVARCHAR(10),
gp INT,
age INT
);
GO
INSERT INTO p
VALUES ('Bob', 1, 32);
INSERT INTO p
VALUES ('Jill', 1, 34);
INSERT INTO p
VALUES ('Shawn', 1, 42);
INSERT INTO p
VALUES ('Jake', 2, 29);
INSERT INTO p
VALUES ('Paul', 2, 36);
INSERT INTO p
VALUES ('Laura', 2, 39);
GO
SELECT t.person, t.gp, t.age
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY gp ORDER BY age DESC) row
FROM p
) t
WHERE t.row = 1;
Upvotes: 8
Reputation: 3883
This method has the benefit of allowing you to rank by a different column, and not trashing the other data. It's quite useful in a situation where you are trying to list orders with a column for items, listing the heaviest first.
Source: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
SELECT person, group,
GROUP_CONCAT(
DISTINCT age
ORDER BY age DESC SEPARATOR ', follow up: '
)
FROM sql_table
GROUP BY group;
Upvotes: 0
Reputation: 1528
I would not use Group as column name since it is reserved word. However following SQL would work.
SELECT a.Person, a.Group, a.Age FROM [TABLE_NAME] a
INNER JOIN
(
SELECT `Group`, MAX(Age) AS oldest FROM [TABLE_NAME]
GROUP BY `Group`
) b ON a.Group = b.Group AND a.Age = b.oldest
Upvotes: 3
Reputation: 611
You can also try
SELECT * FROM mytable WHERE age IN (SELECT MAX(age) FROM mytable GROUP BY `Group`) ;
Upvotes: 0
Reputation: 21
with CTE as
(select Person,
[Group], Age, RN= Row_Number()
over(partition by [Group]
order by Age desc)
from yourtable)`
`select Person, Age from CTE where RN = 1`
Upvotes: 1
Reputation: 11
Using CTEs - Common Table Expressions:
WITH MyCTE(MaxPKID, SomeColumn1)
AS(
SELECT MAX(a.MyTablePKID) AS MaxPKID, a.SomeColumn1
FROM MyTable1 a
GROUP BY a.SomeColumn1
)
SELECT b.MyTablePKID, b.SomeColumn1, b.SomeColumn2 MAX(b.NumEstado)
FROM MyTable1 b
INNER JOIN MyCTE c ON c.MaxPKID = b.MyTablePKID
GROUP BY b.MyTablePKID, b.SomeColumn1, b.SomeColumn2
--Note: MyTablePKID is the PrimaryKey of MyTable
Upvotes: 1