Doc
Doc

Reputation: 11

Slow query when adding an extra WHERE

I have a query that is performing as expected when running it with one or 2 filters. But when i add a third it seems the execution plan changes and kills performance.

query with 2 filters

EXPLAIN ANALYZE SELECT
   "ed_system"."name",
   "ed_system"."geom"::bytea,
   (
      "ed_system"."geom" <<->> ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)
   )
   AS "distance"
FROM
   "ed_body" 
   INNER JOIN
      "ed_system" 
      ON ("ed_body"."system_id" = "ed_system"."id") 
WHERE
   (
      ST_3DDistance("ed_system"."geom", ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)) < 20.0 
      AND NOT "ed_body"."ring_count" = 0
   )
ORDER BY
   "distance" ASC LIMIT 10

output

Limit  (cost=0.85..24526.33 rows=10 width=60) (actual time=17.999..34.674 rows=10 loops=1)
  ->  Nested Loop  (cost=0.85..252614946.65 rows=103001 width=60) (actual time=17.997..34.670 rows=10 loops=1)
        ->  Index Scan using ed_system_geom_id on ed_system  (cost=0.42..41629457.42 rows=6361500 width=64) (actual time=17.973..34.448 rows=12 loops=1)
              Order By: (geom <<->> '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry)
              Filter: (st_3ddistance(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry) < '20'::double precision)
        ->  Index Scan using ed_body_system_id_0b3a29a2 on ed_body  (cost=0.43..33.15 rows=2 width=4) (actual time=0.010..0.015 rows=1 loops=12)
              Index Cond: (system_id = ed_system.id)
              Filter: (ring_count <> 0)
              Rows Removed by Filter: 9
Planning time: 0.557 ms
Execution time: 34.729 ms

query with an extra filter

EXPLAIN ANALYZE SELECT
   "ed_system"."name",
   "ed_system"."geom"::bytea,
   (
      "ed_system"."geom" <<->> ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)
   )
   AS "distance"
FROM
   "ed_body" 
   INNER JOIN
      "ed_system" 
      ON ("ed_body"."system_id" = "ed_system"."id") 
WHERE
   (
      ST_3DDistance("ed_system"."geom", ST_GeomFromEWKB('\x01010000a0e6100000000000000000000000000000000000000000000000000000'::bytea)) < 20.0 
      AND NOT "ed_body"."ring_count" = 0
      AND "ed_body"."scoopable" = True
   )
ORDER BY
   "distance" ASC LIMIT 10

output:

Limit  (cost=0.85..84471.16 rows=10 width=60) (actual time=238721.835..238721.835 rows=0 loops=1)
  ->  Nested Loop  (cost=0.85..252549321.31 rows=29898 width=60) (actual time=238721.832..238721.832 rows=0 loops=1)
        ->  Index Scan using ed_system_geom_id on ed_system  (cost=0.42..41629457.42 rows=6361500 width=64) (actual time=20.292..238664.727 rows=107 loops=1)
              Order By: (geom <<->> '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry)
              Filter: (st_3ddistance(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry) < '20'::double precision)
              Rows Removed by Filter: 19118611
        ->  Index Scan using ed_body_system_id_0b3a29a2 on ed_body  (cost=0.43..33.15 rows=1 width=4) (actual time=0.532..0.532 rows=0 loops=107)
              Index Cond: (system_id = ed_system.id)
              Filter: (scoopable AND (ring_count <> 0))
              Rows Removed by Filter: 12
Planning time: 0.952 ms
Execution time: 238733.393 ms

Almost identical plans except that the second plan has a shitload of filtered rows on the ed_system index scan and is sssllooowww.

Which makes me think that for some reason in the second plan the planner thinks it's better to loop through the ed_body table first and filter that before filtering on the distance join. I ran VACUUM, added extra indexes and tried a subquery but with no avail.

Any suggestions?

edit:

indexes:

CREATE UNIQUE INDEX ed_body_pkey ON ed_body(id int4_ops); 
CREATE INDEX ed_body_system_id_0b3a29a2 ON ed_body(system_id int4_ops); 
CREATE INDEX ed_body_test4 ON ed_body(ring_count int4_ops,scoopable bool_ops);

CREATE UNIQUE INDEX ed_system_pkey ON ed_system(id int4_ops); 
CREATE INDEX ed_system_geom_id ON ed_system USING GIST (geom gist_geometry_ops_nd); 
CREATE INDEX ed_system_geom_idx ON ed_system(geom btree_geometry_ops);

edit:

forced a index scan on the ed_system table. But is slow.

Index Only Scan using ed_system_geom_idx on ed_system  (cost=0.56..5581365.22 rows=358 width=4) (actual time=7.542..53948.185 rows=12 loops=1)
  Filter: ((geom && '01030000A0E6100000010000000500000000000000000024C000000000000024C000000000000024C000000000000024C0000000000000244000000000000024C0000000000000244000000000000024400000000000002440000000000000244000000000000024C0000000000000244000000000000024C000000000000024C000000000000024C0'::geometry) AND ('01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry && st_expand(geom, '10'::double precision)) AND _st_3ddwithin(geom, '01010000A0E6100000000000000000000000000000000000000000000000000000'::geometry, '10'::double precision))
  Rows Removed by Filter: 19118706
  Heap Fetches: 0
Planning time: 0.709 ms
Execution time: 53948.245 ms

Upvotes: 1

Views: 201

Answers (1)

jjanes
jjanes

Reputation: 44202

The "Order By:" in the index scan looks attractive here, but is probably is not. You can force it not to use that by changing your order by clause to ORDER BY "distance" + 0.0 ASC LIMIT 10. If this does not fix the problem, then please post the EXPLAIN (ANALYZE, BUFFERS) for it, as it will at least provide us with more information that could be useful.

You said you guessed that about 20% of the rows met "ed_body"."scoopable" = True (why guess? Just run the count(*) query and know for sure). But the actual number of rows was reduced from some unknown value (because it stopped once it met the LIMIT), all the way down to zero. So either they are much more rare than that, or there is a correlation which makes them rare specifically in rows that meet the other criterion.

Upvotes: 2

Related Questions