Michael Snytko
Michael Snytko

Reputation: 337

Using GIST index with PostGIS makes query slower than without it

I have 2 sample tables with 1m rows:

create table deliveries
(
    id                 serial
        primary key,
    order_id           integer               not null
        references orders,
    point              geometry(Point, 4326) not null,
    delivery_timestamp timestamp             not null,
    year               integer               not null
);

One of them has index:

create index idx_deliveries_geom
    on deliveries using gist (point);

Sample query:

SELECT ST_AsText(point), order_id, delivery_timestamp
FROM deliveries
WHERE ST_Intersects(
              point,
              ST_MakeEnvelope(32.085065712094298, 54.87186750531864, 32.238874305844298, 54.895072980740217, 4326)
      )
ORDER BY delivery_timestamp
LIMIT 100;

Query with index works about 2.5 times slower than without index. Plan with index:

Limit  (cost=20647022.44..20647086.19 rows=100 width=44)
  ->  Result  (cost=20647022.44..21672876.60 rows=1609183 width=44)
        ->  Sort  (cost=20647022.44..20651045.39 rows=1609183 width=44)
              Sort Key: delivery_timestamp
              ->  Index Scan using idx_deliveries_geom on deliveries  (cost=0.42..20585520.62 rows=1609183 width=44)
                    Index Cond: (point && '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)
"                    Filter: st_intersects(point, '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)"

Plan without:

Limit  (cost=51975155.40..51975229.56 rows=100 width=44)
  ->  Gather Merge  (cost=51975155.40..53282318.69 rows=1762448 width=44)
        Workers Planned: 2
        ->  Result  (cost=51974155.37..52535935.67 rows=881224 width=44)
              ->  Sort  (cost=51974155.37..51976358.43 rows=881224 width=44)
                    Sort Key: delivery_timestamp
                    ->  Parallel Seq Scan on deliveries  (cost=0.00..51940475.62 rows=881224 width=44)
"                          Filter: st_intersects(point, '0103000020E610000001000000050000005DBAE96EE30A4040F3E4BA5A996F4B405DBAE96EE30A4040FFE75DC091724B405DBAE96E931E4040FFE75DC091724B405DBAE96E931E4040F3E4BA5A996F4B405DBAE96EE30A4040F3E4BA5A996F4B40'::geometry)"
JIT:
  Functions: 7
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"

I dont understand 2 things:

I have another bigger query that only seconds this question, parallel scan works faster than index at least on my sample data. Data is generated randomly around points 32.0401, 54.7818 (the logic for generation is slightly harder, but all points are in about 10x10 km square).

Upvotes: 0

Views: 23

Answers (0)

Related Questions