user6269864
user6269864

Reputation:

Extremely slow query when trying to COUNT(id) for each row in the results

I have a table containing conversation logs. Each message is a table row, and each conversation is multiple rows that share the same conversationId.

I am selecting unique conversationIds for the last month, along with the other columns (such as the text and timestamp) from the most recent message in each conversation.

This works fine and the query takes around 1 second or less:

SELECT 
    *

FROM
    (
      SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY conversationId ORDER BY id DESC) AS rn

        FROM Logs 

        WHERE 
            timestamp >= DATEADD(month, -1, GETDATE())
    ) AS t

WHERE 
    t.rn = 1

ORDER BY
    timestamp DESC

Now, I also want to add a new column messageCount which has the count of all messages for any time for each conversationId.

To do this, I added one more line to the query:

SELECT 
    *, 
    (SELECT COUNT(id) FROM Logs WHERE conversationId = t.conversationId) AS messageCount

FROM
    (
      SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY conversationId ORDER BY id DESC) AS rn

        FROM Logs 

        WHERE 
            timestamp >= DATEADD(month, -1, GETDATE())
    ) AS t

WHERE 
    t.rn = 1

ORDER BY
    timestamp DESC

This returns the correct results, but it is five times slower, as much as 5 seconds when there are merely 14 rows in the resulting table.

What am I doing wrong here and how can I make this query faster?

Upvotes: 1

Views: 150

Answers (2)

Fahmi
Fahmi

Reputation: 37473

Row_number is slower if id column has index, in that you can use subquery like below:

select t.conversationId, count(id) as messagecount 
from logs t 
where t.id in (
    SELECT MAX(t1.id)
    from logs t1
    WHERE t1.conversationId = t.conversationId
    GROUP BY t1.conversationId
) and timestamp >= DATEADD(month, -1, GETDATE()) 
group by t.conversationId

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521259

You may try calling COUNT as an analytic function directly in the subquery:

WITH cte AS (
    SELECT *,
       ROW_NUMBER() OVER (PARTITION BY conversationId ORDER BY id DESC) AS rn,
       COUNT(id) OVER (PARTITION BY conversationId) messageCount
    FROM Logs 
    WHERE 
        timestamp >= DATEADD(month, -1, GETDATE())
)

SELECT *
FROM cte
WHERE rn = 1;

At the very least, this avoids the potentially costly correlated subquery you were using. Beyond this, if the above is still too slow for you, then the next step would be to look at the execution plan.

Upvotes: 2

Related Questions