Joining tables based on the maximum id

Question

I have found three questions which all seem to ask a similar question:

Getting max value from rows and joining to another table

Select only rows by join tables max value

Joining tables based on the maximum value

But I'm having a hard time wrapping my head around how exactly to join tables keeping only the maximum row of one of the tables when the maximum is in the id or index field itself.

I am looking for an answers that only require joins because this will allow the solution to work in a tool which generates queries for which it is easy to get it to generate the corresponding joins, although sub-queries are probably doable as well with a bit more effort. I found the answer below to be of particular interest:

SELECT DISTINCT b.id, b.filename, a1.name
FROM a a1
JOIN b
  ON b.id = a1.id
LEFT JOIN a a2
  ON a2.id = a1.id
  AND a2.rank > a1.rank
WHERE a2.id IS NULL

However, in my case the ranking column is also the index, e.g. "id". I cannot compare for equality and greater than at the same time, because they will never be true at the same time!

Also, potentially complicating the situation is that a typical query in which I have need of this may join several tables (3-5 is not uncommon). So as a simplified example of my query:

SELECT
    table1.field1, table1.field2, table1.field3,
    table2.field1, table2.field2, table2.field3,
    table3.field1, table3.field2, table3.field3,
    table4.field1, table4.field2, table4.field3
FROM table1
INNER JOIN table2 ON
    table1.field1 = table2.field1
    AND table1.field2 = table2.field2
    AND table2.field3 < 0
INNER JOIN table3 ON
    table2.field1 = table3.field1
    AND table2.field4 = table3.field4
INNER JOIN table4 ON
    table1.field1 = table4.field1
    AND table1.field2 = table4.field2

And what I want to do is to eliminate duplicates in table3 by only getting the row with the maximum id (e.g. MAX(table3.id)) for each unique combination of all the other fields. That is to say, the above query is returning something like this:

+-------+-------+-------+---------+
| table1| table2| table4|table3   |
+-------+-------+-------+---------+
|  A    |   A   |   A   | 1,...   |
|  A    |   A   |   A   | 2,...   |
|  A    |   A   |   A   | 3,...   |
|  A    |   A   |   A   | MAX2,...|
|  B    |   B   |   B   | 1,...   |
|  B    |   B   |   B   | 2,...   |
|  B    |   B   |   B   | 3,...   |
|  B    |   B   |   B   | MAX2,...|
+-------+-------+-------+---------+

(I'm just using A and B to denote that I'm talking about all the same values for the fields in table1, table2, and table4 for a particular set of rows.)

and I want to reduce it to this:

+-------+-------+-------+---------+
| table1| table2| table4|table3   |
+-------+-------+-------+---------+
|  A    |   A   |   A   | MAX1,...|
|  B    |   B   |   B   | MAX2,...|
+-------+-------+-------+---------+

S3S · Accepted Answer

You can add a derived table to reduce the matching rows in TABLE3 to one per group. Another method would use a window function but you asked for a JOIN only

SELECT
    table1.field1, table1.field2, table1.field3,
    table2.field1, table2.field2, table2.field3,
    table3.field1, table3.field2, table3.field3,
    table4.field1, table4.field2, table4.field3
FROM table1
INNER JOIN table2 ON
    table1.field1 = table2.field1
    AND table1.field2 = table2.field2
    AND table2.field3 < 0
INNER JOIN table3 ON
    table2.field1 = table3.field1
    AND table2.field4 = table3.field4

--here is the added derived table. Change column names as needed
INNER JOIN (select UID, ID = max(ID) from Table3 group by UID) x
    on x.UID = table3.UID and x.mx = table3.ID

INNER JOIN table4 ON
    table1.field1 = table4.field1
    AND table1.field2 = table4.field2

Or, perhaps... something like below. It really depends on your schema and that's hard to understand with the sample data.

INNER JOIN (select field1, field4, mx = max(ID) from Table3 group by field1, field4) x
    on x.field1 = table3.field1 and x.field4 = table3.field4 and x.mx = table3.ID

Here is an example. You'll notice that the last three column pairs are identical. You only want the last one, which is the max(id) for that grouping. What ever makes a row unique in relation to the rest of your data (not your primary key, but what you are joining with) is what you'd want to include int he derived table and join condition.

declare @table table (id int identity(1,1), f1 char(1), f2 char(1))
insert into @table
values
('a','b'),
('a','c'),
('a','a'),
('b','b'),
('b','b'),
('b','b')

select * from @table

select t1.*
from @table t1
inner join 
    (select f1, f2, mx = max(id) from @table group by f1, f2) t2 on
    t1.f1 = t2.f1
    and t1.f2 = t2.f2
    and t1.id = t2.mx

Joining tables based on the maximum id

Answers (1)

Related Questions