Reputation:
I have a table containing conversation logs. Each message is a table row, and each conversation is multiple rows that share the same conversationId
.
I am selecting unique conversationId
s for the last month, along with the other columns (such as the text and timestamp) from the most recent message in each conversation.
This works fine and the query takes around 1 second or less:
SELECT
*
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY conversationId ORDER BY id DESC) AS rn
FROM Logs
WHERE
timestamp >= DATEADD(month, -1, GETDATE())
) AS t
WHERE
t.rn = 1
ORDER BY
timestamp DESC
Now, I also want to add a new column messageCount
which has the count of all messages for any time for each conversationId
.
To do this, I added one more line to the query:
SELECT
*,
(SELECT COUNT(id) FROM Logs WHERE conversationId = t.conversationId) AS messageCount
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY conversationId ORDER BY id DESC) AS rn
FROM Logs
WHERE
timestamp >= DATEADD(month, -1, GETDATE())
) AS t
WHERE
t.rn = 1
ORDER BY
timestamp DESC
This returns the correct results, but it is five times slower, as much as 5 seconds when there are merely 14 rows in the resulting table.
What am I doing wrong here and how can I make this query faster?
Upvotes: 1
Views: 150
Reputation: 37473
Row_number is slower if id column has index, in that you can use subquery like below:
select t.conversationId, count(id) as messagecount
from logs t
where t.id in (
SELECT MAX(t1.id)
from logs t1
WHERE t1.conversationId = t.conversationId
GROUP BY t1.conversationId
) and timestamp >= DATEADD(month, -1, GETDATE())
group by t.conversationId
Upvotes: 3
Reputation: 521259
You may try calling COUNT
as an analytic function directly in the subquery:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY conversationId ORDER BY id DESC) AS rn,
COUNT(id) OVER (PARTITION BY conversationId) messageCount
FROM Logs
WHERE
timestamp >= DATEADD(month, -1, GETDATE())
)
SELECT *
FROM cte
WHERE rn = 1;
At the very least, this avoids the potentially costly correlated subquery you were using. Beyond this, if the above is still too slow for you, then the next step would be to look at the execution plan.
Upvotes: 2