Alien
Alien

Reputation: 426

Postgres SQL Query ignoring index?

I have the following query:

SELECT a.id, a.col2, b.id, b.col2, c.id, c.col2
FROM a 
  JOIN b on b.fk_a_id = a.id 
  JOIN c on c.fk_a_id = a.id
  INNER JOIN d on d.fk_c_id = c.id 
WHERE a.owner_id NOT IN (1, 3, 100, 41)
GROUP BY a.id, b.id, c.id 
ORDER BY a.created_date desc
LIMIT __ OFFSET __

Indexes on: a.id, a.owner_id, b.id, c.id

However, none of my indexes are being used in this query. I have another similar query with one extra table join that is using indexes as I expect. Any ideas as to why this query is not making use of the indexes?

edit to include explain:

"Limit  (cost=7.88..7.89 rows=4 width=243) (actual time=175.824..175.825 rows=10 loops=1)" 
"  ->  Sort  (cost=7.88..7.89 rows=4 width=243) (actual time=175.822..175.822 rows=10 loops=1)" 
"        Sort Key: a.created_date DESC" 
"        Sort Method: quicksort  Memory: 27kB" 
"        ->  HashAggregate  (cost=7.78..7.84 rows=4 width=243) (actual time=175.771..175.778 rows=10 loops=1)" 
"              Group Key: a.id, b.id, c.id" 
"              ->  Hash Join  (cost=5.12..7.75 rows=4 width=243) (actual time=0.072..0.099 rows=20 loops=1)" 
"                    Hash Cond: (a.id = b.fk_a_id)" 
"                    ->  Hash Join  (cost=2.85..5.43 rows=4 width=163) (actual time=0.041..0.063 rows=20 loops=1)" 
"                          Hash Cond: (a.id = d.fk_a_id)" 
"                          ->  Seq Scan on table a  (cost=0.00..2.44 rows=27 width=126) (actual time=0.008..0.025 rows=28 loops=1)" 
"                                Filter: (owner_id <> ALL ('{1,3,100,41}'::bigint[]))" 
"                                Rows Removed by Filter: 1" 
"                          ->  Hash  (cost=2.76..2.76 rows=7 width=53) (actual time=0.027..0.027 rows=3 loops=1)" 
"                                Buckets: 1024  Batches: 1  Memory Usage: 9kB" 
"                                ->  Hash Join  (cost=1.16..2.76 rows=7 width=53) (actual time=0.019..0.023 rows=3 loops=1)" 
"                                      Hash Cond: (c.id = d.fk_c_id)" 
"                                      ->  Seq Scan on table c  (cost=0.00..1.39 rows=39 width=45) (actual time=0.002..0.004 rows=39 loops=1)" 
"                                      ->  Hash  (cost=1.07..1.07 rows=7 width=8) (actual time=0.007..0.007 rows=3 loops=1)" 
"                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                            ->  Seq Scan on table d  (cost=0.00..1.07 rows=7 width=8) (actual time=0.003..0.004 rows=3 loops=1)" 
"                    ->  Hash  (cost=2.12..2.12 rows=12 width=88) (actual time=0.022..0.022 rows=12 loops=1)" 
"                          Buckets: 1024  Batches: 1  Memory Usage: 9kB" 
"                          ->  Seq Scan on table b  (cost=0.00..2.12 rows=12 width=88) (actual time=0.005..0.013 rows=12 loops=1)"
"Planning time: 210.946 ms"
"Execution time: 175.987 ms"

Upvotes: 2

Views: 3776

Answers (2)

user330315
user330315

Reputation:

An index is used to (quickly) find "a few" rows in "many rows". It's not a magical silver bullet that makes everything go faster.

You don't have enough rows in your tables to make an index lookup efficient. And you are getting nearly all of them, not just a small part.

If you look at the plan you will see that the actual data retrieval never takes more then 0.1 ms. (that's a tenth of a millisecond). The Seq Scan for tables c and d only takes 0.004ms - no index will speed that up for only 4 rows.

Doing that through indexes with random I/O for just 20 or 30 rows would definitely be slower. Depending on the number of columns in the tables even the 39 rows of the "biggest" table are probably stored on a single block - which means to read all rows, the database only needs to do a single I/O operation when using a Seq Scan.

The slowest part of the plan is the HashAggregate not the reading of the data, so the choice to not use an index seems correct.

What kind of hardware is that? 175 milliseconds to aggregate 20 rows seems extremely slow - as is the planning time of 210ms. For such a simple statement, the planning time should be more like 1ms.

Upvotes: 2

Thorsten Kettner
Thorsten Kettner

Reputation: 94884

There is ony one criteria on table A: WHERE a.owner_id NOT IN (1, 3, 100, 41). To me that sounds like "select all records except a few". It would be a lot of work to read all records via an index and still end up up with most of them. It's way faster to simply read the table and dismiss some few records on-the-fly.

Then with those many, many A records, we may match many, many B, C, and D records. Again, we'd better don't read indexes plus most of the table data, but simply put the data in buckets (hash join). That seems fastest.

So that the the optimizer chooses not to use indexes for your query seems like a good idea. It proves that it's doing a good job :-)

The only way I see to speed this up with indexes would be covering indexes, i.e. indexes containing all desired columns:

  • a(owner_id, id, created_date, col2)
  • b(fk_a_id, id, col2)
  • c(fk_a_id, id, col2)
  • d(fk_c_id)

Then we wouldn't have to read the indexes plus the tables, but the indexes alone.

BTW: As you don't select anything from D, you either want to check for existence, then you should use EXISTS or IN instead of a join for readability in my opinion. Or you don't; then you can dismiss it completely from your query.

Upvotes: 0

Related Questions