user1405177
user1405177

Reputation: 477

How to get the most recent row in SQL?

I have a messaging system.

threads
+----+-------+
| id | title |
+----+-------+
| PK | TEXT  |
+----+-------+

messages
+----+--------------+----------------+-----------+-------------+---------+
| id |   from_id    |   thread_id    |   sent    |   parent    | message |
+----+--------------+----------------+-----------+-------------+---------+
| PK | FK(users.id) | FK(threads.id) | TIMESTAMP | messages.id | TEXT    |
+----+--------------+----------------+-----------+-------------+---------+

recipients
+----+-----------------+--------------+--------+
| id |     msg_id      |    to_id     | status |
+----+-----------------+--------------+--------+
| PK | FK(messages.id) | FK(users.id) | ENUM   |
+----+-----------------+--------------+--------+

users
+----+---------+
| id |  name   |
+----+---------+
| PK | VARCHAR |
+----+---------+

Essentially, it's a messaging system in which:

So hopefully my schema is correct.

I wish to get a list of all threads, which shows the most recent message inside the thread, and the author of that message:

+----------+------------+-----------+--------------+------------------+---------------+-------------------+
| users.id | users.name | thread.id | thread.title | messages.message | messages.sent | recipients.status |
+----------+------------+-----------+--------------+------------------+---------------+-------------------+

The issue is getting the most recent message as part of the query. Given that recipients.status = 1 means unread.. Ignoring users for now (that'd be a relatively simple join onto the rest of the tables...), also assuming we want user 1's threads:

SELECT threads.id, title, message, sent, recipients.status
FROM recipients
JOIN messages
ON messages.id=recipients.msg_id
JOIN threads ON threads.id=messages.thread_id
WHERE recipients.to_id=1
AND recipients.status=1

This gets me all messages in all threads that the user is participating in. However, I only need the most recent, and that is where I am stuck.

One solution which I am very not fond of (are there any reasons NOT to do this?)

SELECT *
FROM (
    SELECT threads.id, title, message, sent, recipients.status
    FROM recipients
    JOIN messages
    ON messages.id=recipients.msg_id
    JOIN threads ON threads.id=messages.thread_id
    WHERE recipients.to_id=1
    AND recipients.status=1
    ORDER BY sent DESC
) a
GROUP BY id

Upvotes: 1

Views: 93

Answers (4)

FuzzyTree
FuzzyTree

Reputation: 32402

One solution which I am very not fond of (are there any reasons NOT to do this?)

Your query will not necessarily select the row with the latest sent value for each thread. Even though your inner query orders by sent DESC, mysql is free to choose any value from each group:

https://dev.mysql.com/doc/refman/5.0/en/group-by-extensions.html

MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.

I recommend using variables to emulate row_number() to number messages within a thread in order of when they were sent (i.e. most recent sent message within a thread will be #1, 2nd most recent #2, etc) and then to only keep #1 messages.

SELECT * FROM (
    SELECT threads.id, title, message, sent, recipients.status,
    @rowNumber := IF(@prevId = threads.id,@rowNumber+1,1) rowNumber,
    @prevId := threads.id
    FROM recipients
    JOIN messages
    ON messages.id=recipients.msg_id
    JOIN threads ON threads.id=messages.thread_id
    WHERE recipients.to_id=1
    AND recipients.status=1
    ORDER BY threads.id, sent DESC
) t1 WHERE rowNumber = 1

Edit

Another way using not exists to only select messages where a more recent message in the same thread does not exist.

SELECT threads.id, title, message, sent, recipients.status
FROM recipients
JOIN messages
ON messages.id=recipients.msg_id
JOIN threads ON threads.id=messages.thread_id
WHERE recipients.to_id=1
AND recipients.status=1
AND NOT EXISTS (
    SELECT 1 FROM threads t2
    WHERE t2.id = threads.id
    AND t2.sent > threads.sent
)

Upvotes: 1

user1405177
user1405177

Reputation: 477

I was able to accomplish it with this:

SELECT threads.id AS thread_id, threads.title, users.id AS user_id, users.name, m1.message, m1.sent
FROM messages m1
LEFT JOIN messages m2
ON m1.thread_id = m2.thread_id AND m1.sent < m2.sent
JOIN recipients
ON recipients.status=1
AND recipients.msg_id=m1.id
AND recipients.to_id=1
JOIN threads
ON threads.id=m1.thread_id
JOIN users
ON m1.from_id=users.id
WHERE m2.sent IS NULL

The relevant parts to the problem would be:

SELECT ...
FROM messages m1
LEFT JOIN messages m2
ON m1.thread_id = m2.thread_id AND m1.sent < m2.sent
WHERE m2.sent IS NULL

Upvotes: 0

O. Jones
O. Jones

Reputation: 108841

The most recent message id from each thread can be gotten thusly.

SELECT MAX(id) AS most_recent_message_id,
       thread_id
  FROM messages
 GROUP BY thread_id

Do you want this to be efficient? In that case create a compound index on (thread_id, id).

If you want a list of threads in which a given user (let's say user 42) is participating either as originator or recipient, you need to use a UNION operator

SELECT DISTINCT thread_id FROM messages WHERE user_id = 42 UNION SELECT DISTINCT thread_id FROM recipients JOIN messages ON recipients.msg_id = messages.id WHERE recipients.to_id = 42

This gets you the threads in which a user is participating.

So, if you want the ids of the most recent messages in which your user is participating (either as originator or recipient) you join those two subqueries

SELECT most_recent_message_id
  FROM (
        SELECT MAX(id) AS most_recent_message_id,
               thread_id
          FROM messages
         GROUP BY thread_id
       ) AS a
  JOIN (
       SELECT DISTINCT thread_id
         FROM messages
        WHERE user_id = 42
       UNION
       SELECT DISTINCT thread_id
         FROM recipients
         JOIN messages ON recipients.msg_id = messages.id
        WHERE recipients.to_id = 42
       ) AS b ON a.thread_id = b.thread_id 

See how this goes? You use the aggregation MAX() and set-construction (DISTINCT, UNION) features of SQL to construct the lists of relevant items, then you JOIN to get the list you want.

I assume that once you have an appropriate list of messages, you can go get the content you need with another join or two.

Upvotes: 0

Aurelien Ecoto
Aurelien Ecoto

Reputation: 226

Same answer than SoftwareCarpente, just add an Order By desc on the message id (or the timestamp) and add a LIMIT 1 if you only need the 1st row.

Upvotes: 0

Related Questions