RoyB
RoyB

Reputation: 3164

Why is postgis not applying an index when there is a "WHERE IN" in the query?

We have a table shops, with column named location of type GEOMETRY(POINT,4326) and version of type tinyint and a table version with a single row containing an integer.

Why is below query not using the index for GIST(location)?

SELECT * FROM shops 
WHERE ("version" IN (SELECT "version" FROM "version")) 
ORDER BY (location <-> '0020000001000010e64029d460d2a8aee0404bc8bb0955ea17'::geometry) LIMIT 10;

Where as the same query without the IN does use the index?

SELECT * FROM shops 
WHERE ("version" = (SELECT "version" FROM "version" LIMIT 1)) 
ORDER BY (location <-> '0020000001000010e64029d460d2a8aee0404bc8bb0955ea17'::geometry) LIMIT 10;

This is affecting us since we updated from postgres 9 to 11. I was able to trace back the issue to the above selection.

EDIT: Add qry analyze

First query (without index application):

"Limit  (cost=25260.30..25260.32 rows=10 width=1275) (actual time=254.809..254.814 rows=10 loops=1)"
"  ->  Sort  (cost=25260.30..25260.39 rows=36 width=1275) (actual time=254.807..254.809 rows=10 loops=1)"
"        Sort Key: ((shops.location <-> '0101000020E6100000E0AEA8D260D4294017EA5509BBC84B40'::geometry))"
"        Sort Method: top-N heapsort  Memory: 54kB"
"        ->  Nested Loop  (cost=41.88..25259.52 rows=36 width=1275) (actual time=0.099..215.201 rows=58179 loops=1)"
"              Join Filter: (shops.version = version.version)"
"              ->  HashAggregate  (cost=41.88..43.88 rows=200 width=4) (actual time=0.014..0.016 rows=1 loops=1)"
"                    Group Key: version.version"
"                    ->  Seq Scan on version  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.009..0.010 rows=1 loops=1)"

Second query:

"Limit  (cost=0.28..440.04 rows=10 width=1275) (actual time=0.194..0.233 rows=10 loops=1)"
"  ->  Nested Loop Semi Join  (cost=0.28..1574995.44 rows=35815 width=1275) (actual time=0.193..0.230 rows=10 loops=1)"
"        Join Filter: (shop.version = version.version)"
"        ->  Index Scan using shop_location_idx on shops  (cost=0.28..101549.81 rows=71630 width=1267) (actual time=0.182..0.213 rows=10 loops=1)"
"              Order By: (location <-> '0101000020E6100000E0AEA8D260D4294017EA5509BBC84B40'::geometry)"
"        ->  Materialize  (cost=0.00..48.25 rows=2550 width=4) (actual time=0.001..0.001 rows=1 loops=10)"
"              ->  Seq Scan on version  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.006..0.006 rows=1 loops=1)"

Solved

See the answer below, with thanks to @JimJones and @JimMacaulay

Upvotes: 0

Views: 135

Answers (1)

RoyB
RoyB

Reputation: 3164

Adding an index on the version table somehow helps postgres towards using the index on the location column as well.

So adding this index is the fix for the 1st query:

CREATE INDEX version_idx
    ON public.version USING btree
    (version)
    TABLESPACE pg_default;

Correctly applying indexes on all columns inside the lookup, helps postgres to make a performant query plan.

Upvotes: 2

Related Questions