ThePrimeagen
ThePrimeagen

Reputation: 4582

MySQL, GroupBy OrderBy

The first question your probably asking is how many Group by / Order by questions are there on SO? A lot. None of which are fitting my specific circumstance. Here is my following SQL query.

The goal is to get the Latest media for each team in $teamArr (a set of unique ids).

The problem I face is that it will not get the latest media for each, but its based on the order of the group by which is the time of media independent of the team.

Here is the MySQL command (With PHP's sprintf() command for ease of use with the team Ids).

sprintf("SELECT M.*, T.name, T.has_image
    FROM media AS M
        JOIN teams AS T
          ON T.team_uid = M.team_uid
    WHERE M.team_uid IN (%s)
    GROUP BY T.team_uid
    ORDER BY M.time DESC", implode(", ", $teamArr));

Scenario:

team A has a media item of 3 hours ago, team B has 2 items 6 hours ago and 2 days ago, and Team C has a media item of 9 hours ago.

If the order of the select turns out to be Team A, C, B then the media items will be as follows...

There will probably be some silly remarks saying

Just order by team name dummy! Such a simple problem. (this answer will get about +4 votes)

But that clearly does not help me at all. I cannot know what teams will have the optimal order (if any natural or unnatural ordering will even work with more than 3 cases). Is there such a way that I can do it in 1 SQL query. I hear that MySQL in a loop is a poor decision, hence the reason why i want this to work in 1 SQL call.

Upvotes: 4

Views: 206

Answers (1)

mathematical.coffee
mathematical.coffee

Reputation: 56945

This sort of query is usually tagged "greatest-n-per-group" for future reference.

You want the maximum time per team and the corresponding columns, right?

The typical way to solve this is to LEFT JOIN the table you want to get the max of to itself (in your case media):

SELECT M.*, T.name, T.has_image
FROM media AS M
LEFT JOIN media AS M2              -- new: join media to itself
       ON M.team_uid = M2.team_uid -- within each team_uid (GROUP BY var)
      AND M.time < M2.time         -- and a sort on time (ORDER BY var)
JOIN teams AS T              -- same as before.
 ON T.team_uid = M.team_uid
WHERE M.team_uid IN (%s)     
  AND M2.time IS NULL        -- new: this selects the MAX M.time for each team.

The changes:

  • LEFT JOIN media to itself within each team_uid (the GROUP BY variable), and such that M.time < M2.time (the variable we want the MAX of).
  • since it's a LEFT JOIN, if there is a M.time for which there is no larger M2.time (for the same team), then M2.time will be NULL. This is precisely when M.time is the MAX time for that team.
  • add the M2.time IS NULL in the WHERE condition (to enforce the above)
  • remove the GROUP BY, ORDER BY (dealt with in the join condition).

Upvotes: 6

Related Questions