omega
omega

Reputation: 43843

How to sort groups in MySQL join operator?

In my sql I have this query

SELECT * FROM threads t 
JOIN (
    SELECT c.* 
    FROM comments c 
    WHERE c.thread_id = t.id 
    ORDER BY date_sent
    ASC LIMIT 1
    ) d ON t.id = d.thread_id 
ORDER By d.date_sent DESC

Basically I have two tables, threads and comments. Comments have a foreign key to the thread table. I want to get the earliest comment row for each thread row. Threads should have at least 1 comment. If it doesn't, then the thread row shouldn't be included.

In my query above, I do a select on thread, and then I join it with a custom query. I want to use t.id, where t is the select table outside the brackets. Inside the brackets I create a new result set thats comments are for the current thread. I do the sorting and limiting there.

Then afterwards, I sort it again, so its earliest on top. However when I run this, it gives an error #1054 - Unknown column 't.id' in 'where clause'.

Does anyone know whats wrong here?

Thanks

Upvotes: 1

Views: 23

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269923

Michael's answer is correct. This is another answer that follows more the form of your query. You can do what you want as a correlated subquery and then join in the additional information:

SELECT *
FROM (SELECT t.*,
             (SELECT c.id
              FROM comments c 
              WHERE c.thread_id = t.id 
              ORDER BY c.date_sent ASC
              LIMIT 1
             ) as mostrecentcommentid
      FROM threads t 
     ) t JOIN
     comments c
     on t.mostrecentcommentid = c.id
ORDER By c.date_sent DESC;

It is possible that this has better performance, because it does not require aggregating all the data. However, for performance, you would want an index on comments(thread_id, date_set, id).

Upvotes: 1

Michael Berkowski
Michael Berkowski

Reputation: 270637

The unknown column t.id is due to the fact that the alias t is unknown inside the subquery, but indeed it isn't needed anyway since you join it in the ON clause.

Instead of a LIMIT 1, use a MIN(date_sent) aggregate grouped by thread_id in the subquery. Be careful also using SELECT * in a join query, if columns in both tables have the same names; better to list the columns explicitly.

SELECT
  /* List the columns you explicitly need here rather than *
     if there is any name overlap (like `id` for example) */
  t.*,
  c.*
FROM
  threads t
  /* join threads against the subquery returning only thread_id and earliest date_sent */
  INNER JOIN (
    SELECT thread_id, MIN(date_sent) AS firstdate
    FROM comments
    GROUP BY thread_id
  ) earliest ON t.id = earliest.thread_id
  /* then join the subquery back against the full comments table to get the other columns 
     in that table. The join is done on both thread_id and the date_sent timestamp */
  INNER JOIN comments c 
    ON earliest.thread_id = c.thread_id
    AND earliest.firstdate = c.date_sent
ORDER BY c.date_sent DESC

Upvotes: 1

Related Questions