codeispoetry
codeispoetry

Reputation: 373

MySQL: Select all row with MAX() value from n relational tables

I did try to look for this answer, but no luck : (

It should be something basic, but I can't make it work...

I have 3 tables:

  1. flightPerson
personID flightID
1 587
51 44
1 37
... ...
  1. flight
flightID departure arrive
37 1998-06-01 1998-06-03
587 2022-01-01 2022-01-02
44 2022-01-01 0000-00-00
... ... ...
  1. country
personID countryID
1 12
51 27
... ...

GOAL: select peopleID, flightID, departure, arrive, countryID WHERE countryID = 12 AND ( LAST FLIGHT was before 2000-01-01 OR LAST FLIGHT was (after 2022-03-01 AND arrive time is 0000-00-00) )

Thanks for your help!

SELECT t1.personID, flight.flightID, MAX(flight.departure), flight.arrive, country.countryID
                                              FROM flightPerson as t1
                                              LEFT JOIN flight 
                                              ON t1.flightID = flight.flightID 
                                              LEFT JOIN country
                                              ON country.personID = t1.personID 
                                              WHERE country.countryID = 12 AND
                                              flight.departure < " 2000-01-01 " OR (flight.departure > " 2022-03-01" AND flight.arrive= "0000-00-00 00:00:00")  
GROUP BY personID
ORDER BY t1.personID  ASC

EDIT

Based on Chris Schaller answer, I rebuilt my query and it worked for me.

Basically (as Chris suggested in the second Legacy query) you first need to find the last flight ID, then retrieve details for that flight and "get" (using the WHERE clause) only flights whose departure and arrive respect your conditions.

I post it here in case it will be useful for someone else struggling with the same problem ; )

SELECT personID 
                        FROM (
                            SELECT DISTINCT t2.personID, country.countryID
                                ,(SELECT t1.flightID
                                    FROM flightPerson
                                    LEFT JOIN flight as t1
                                      ON t1.flightID = flightPerson.flightID 
                                    WHERE t2.personID = flightPerson.personID
                                    ORDER BY `t1`.` departure` DESC
                                    LIMIT 1
                                )AS lastFlightID
                            FROM flightPerson as t2
                            LEFT JOIN country 
                            ON country.personID = t2.personID
                        ) AS details
                        LEFT JOIN flight AS b
                        ON details. lastFlightID = b.flightID
                        WHERE details.countryID = 12 
                            AND ( b.departure < "2000-01-01"  
                                OR 
                                (b.departure > "2022-03-01" AND b.arrive = "0000-00-00 00:00:00") 
                                )

PS Sorry Chris, I still need to use the double brackets AND (... OR (...) )

Kind of I feel more secure this way... :D

Upvotes: 2

Views: 39

Answers (1)

Chris Schaller
Chris Schaller

Reputation: 16554

There are 2 general solutions to this, the simplest to explain is to use the ROW_NUMBER() window query to select the Last Flight instead of a group by. In MySQL v8 we can use a CTE to help keep the query readable:

WITH PersonFlightData as (
  SELECT t1.personID, flight.flightID, flight.departure, flight.arrive, country.countryID
  , ROW_NUMBER() OVER(PARTITION BY t1.personID ORDER BY flight.departure DESC) as RN
  FROM flightPerson as t1
  LEFT JOIN flight ON t1.flightID = flight.flightID 
  LEFT JOIN country ON country.personID = t1.personID 
  WHERE country.countryID = 12 
)
SELECT personID, flightID, departure, arrive, countryID
FROM PersonFlightData
WHERE RN = 1 --(filters to only include the LAST flight for each person)
  AND (departure < '2000-01-01' OR departure > '2022-03-01' AND arrive = '0000-00-00')
ORDER BY personID ASC

Unfortunately, for your provided dataset, there are no results to this query. Lets remove the departure filter to understand, in fact, lets move the departure filter to a select column to project it into the output:

WITH PersonFlightData as (
  SELECT t1.personID, flight.flightID, flight.departure, flight.arrive, country.countryID
  , ROW_NUMBER() OVER(PARTITION BY t1.personID ORDER BY flight.departure DESC) as RN
  FROM flightPerson as t1
  LEFT JOIN flight ON t1.flightID = flight.flightID 
  LEFT JOIN country ON country.personID = t1.personID 
  WHERE country.countryID = 12 
)
SELECT personID, flightID, departure, arrive, countryID
  , CASE WHEN departure < '2000-01-01' OR departure > '2022-03-01' AND arrive = '0000-00-00' THEN 1 END as Output
FROM PersonFlightData
WHERE RN = 1 --(filters to only include the LAST flight for each person);
personID flightID departure arrive countryID Output
1 587 2022-01-01 2022-01-02 12

View this proof in a fiddle: https://www.db-fiddle.com/f/jKsg1B5RjW5UhTsLbtQHwe/0 Update the schema there with additional data to see if your desired flights are included.

So the last flight for personID=1 was on 2022-01-02, which is not in the required range. personID=51 flights are excluded as their country is 27, but the date of their last flight departure, even though it still has not yet landed ;) is not in the filtered range.

Legacy Versions

For older versions, we can't use CTEs or the ROW_NUMBER() window function, so lets go back to using GROUP BY.

The problem with a GROUP BY, although it seems logical, is that you need to first apply the grouping to determine the Last Flight and then you need to apply your filter only to the results from the Last Flight query. That is one of the problems that using a CTE also solved for us, in this case we will have to use a nested query:

SELECT d.personID, f.flightID, f.departure, f.arrive, countryID
FROM (
  SELECT t1.personID, MAX(flight.departure) AS LastFlightDeparture
  FROM flightPerson as t1
  LEFT JOIN flight ON t1.flightID = flight.flightID 
  GROUP BY personID
) d
LEFT JOIN flightPerson fp ON d.personID = fp.personID
LEFT JOIN flight f ON fp.flightID = f.flightID AND f.departure = d.LastFlightDeparture
LEFT JOIN country ON country.personID = d.personID 
WHERE country.countryID = 12 
  AND (f.departure < '2000-01-01' OR f.departure > '2022-03-01' AND f.arrive = '0000-00-00')
ORDER BY personID ASC;

You can see in this query we only get the departure of the Last Flight, which isn't very efficient to join back into the rest of the query, I would prefer to get the ID of the last flight and use that, but to get the ID will require a different type of sub-query that might be even more inefficient, it is certainly increasing in complexity and becoming harder to read:

SELECT personID, flightID, departure, arrive, countryID, LastFlightID
FROM (
  SELECT fp.personID, fp.flightID, f.departure, f.arrive, country.countryID
    , (SELECT flight.flightID
       FROM flight
       LEFT JOIN flightPerson ON flight.flightID = flightPerson.flightID
       WHERE flightPerson.personID = fp.personID
       ORDER BY departure DESC
       LIMIT 1
    ) as LastFlightID
  FROM flightPerson fp
  LEFT JOIN flight f ON fp.flightID = f.flightID
  LEFT JOIN country ON country.personID = fp.personID 
) flightData
WHERE countryID = 12 
  AND flightID = LastFlightID
  AND (departure < '2000-01-01' OR departure > '2022-03-01' AND arrive = '0000-00-00')
ORDER BY personID ASC;

View this fiddle in MySQL 5.6 here: http://sqlfiddle.com/#!9/a8e82d/3

Finally, a special note about the OR clause...

Your original filter expression:

WHERE country.countryID = 12 
  AND flight.departure < " 2000-01-01 " 
   OR (flight.departure > " 2022-03-01" AND flight.arrive= "0000-00-00 00:00:00")

Is missing some brackets, which I think you are aware of as you had the brackets correct in your explanation:

WHERE countryID = 12 AND ( LAST FLIGHT was before 2000-01-01 OR LAST FLIGHT was (after 2022-03-01 AND arrive time is 0000-00-00) )

What I like to do (because I am an application developer first) is to use indentation to visually separate the content within brackets to make it easier to identify the sub-expressions.

WHERE countryID = 12 
  AND (
        LAST FLIGHT was before 2000-01-01 
        OR LAST FLIGHT was (after 2022-03-01 AND arrive time is 0000-00-00)
  )

But in this case the inner brackets are not necessary because the OR statement will evaluate the expression on either side of the OR independently:

WHERE country.countryID = 12 
  AND (flight.departure < '2000-01-01' OR flight.departure > '2022-03-01' AND flight.arrive= '0000-00-00')

Which fits on one line, but perhaps is easier to read like this:

WHERE country.countryID = 12 
  AND (
       flight.departure < '2000-01-01' 
       OR flight.departure > '2022-03-01' AND flight.arrive= '0000-00-00'
  )

FYI: The fact that X number of tables is involved here is not really the complicating factor. The problem is that you want to apply additional logic to the grouped results, without that logic affecting the grouping. You then also want additional metadata from the rows that correlate to the MAX/MIN or FIRST/LAST record from the grouping.

In SQL that can really only be achieved though sub-queries, whether you use CTEs or Views or Table Valued Variables, the execution is the same, we need to force the engine to evaluate one resultset before applying additional filtering criteria.

Upvotes: 1

Related Questions