Him
Him

Reputation: 5551

SQL index on join and group by

I have a query that is running slowly. I'm pretty sure that the bottleneck is a sequential scan in the plan, so I would like to build appropriate indexes and/or rearrange my query to improve upon that.

Here is my query (and here is a fiddle with a schema and test data):

SELECT conversations.id, max(messages.timestamp) as latest_message FROM
    conversations JOIN messages on conversations.id = messages.cid
    GROUP BY conversations.id ORDER BY latest_message;

I have made indexes on all of the involved columns, and nested indexes on cid and timestamp in both directions, all to no avail. The sequential scan remains:

 Sort  (cost=31.31..31.56 rows=100 width=12)
   Sort Key: (max(messages."timestamp"))
   ->  HashAggregate  (cost=26.99..27.99 rows=100 width=12)
         Group Key: conversations.id
         ->  Hash Join  (cost=3.25..21.99 rows=1000 width=12)
               Hash Cond: (messages.cid = conversations.id)
               ->  Seq Scan on messages  (cost=0.00..16.00 rows=1000 width=12)
               ->  Hash  (cost=2.00..2.00 rows=100 width=4)
                     ->  Seq Scan on conversations  (cost=0.00..2.00 rows=100 width=4)

How can I improve this query and/or what indexes can I build to fix these sequential scans?

Upvotes: 1

Views: 166

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270431

You don't need the join:

SELECT m.cid, max(m.timestamp) as latest_message
FROM messages m 
GROUP BY m.cid
ORDER BY latest_message;

This should be able to use an index on messages(cid, timestamp desc). However, it might be faster to write this as:

SELECT DISTINCT ON (m.cid) m.*
FROM messages m
ORDER BY m.cid, m.timestamp DESC;

With the same index.

Upvotes: 2

Related Questions