10x query time on PG 12 than 10

Question

So I have a DB on Aurora r5.2xl and I created a duplicate db on a i3.2xl (tons of IOPS). This one query for some reason takes 10x longer on the new server than on Aurora even though all the variables are set exactly the same.

Notes:

I've made sure 100x over that IOPs is not the issue, it's not
all settings and variables are the same
CPU burns at 100% if I move the limit up on the new server whereas on aurora it doesn't tick up at all since the query runs so fast
all indexes are identical and I even tried reindex
data is identical (one db has a few extra rows but that's about it)
tried vacuum analyze on both tables
I did a pg_dump and pg_restore for the databases
almost every other query runs faster on the new server except this one
yes, limit without order is used here but only because if I don't limit it to 10 it will take 1hr+ to finish before I can post aurora 12seconds to finish and psql 12 taking 1hr+. this is more efficient and NOT part of the problem

Here is the query

explain analyze SELECT a.id,
          count(b.id) as hotel_count,
          array_agg(b.id) as hotel_ids
   FROM autocomplete a,
        hotel b
   WHERE a.type = 'city'
     AND st_covers(a.poly, b.coords)
     AND st_distance(a.coords, b.coords) < 40000
     AND b.last_seen IS NOT NULL
     AND b.enabled = true
   GROUP BY a.id
   limit 10;

Here are the aurora results

Limit  (cost=0.83..1100.64 rows=10 width=56) (actual time=0.522..2.534 rows=10 loops=1)
  ->  GroupAggregate  (cost=0.83..3696781.84 rows=33613 width=56) (actual time=0.521..2.531 rows=10 loops=1)
        Group Key: a.id
        ->  Nested Loop  (cost=0.83..3668393.17 rows=3729135 width=32) (actual time=0.357..2.506 rows=69 loops=1)
"              ->  Index Scan using ""PK_5523204bb8469c2025bcb0b55bc"" on autocomplete a  (cost=0.42..188879.01 rows=33613 width=176) (actual time=0.025..0.280 rows=14 loops=1)"
"                    Filter: (type = 'city'::autocomplete_type_enum)"
                    Rows Removed by Filter: 133
              ->  Index Scan using hotel_coords_idx on hotel b  (cost=0.41..103.49 rows=3 width=48) (actual time=0.109..0.158 rows=5 loops=14)
                    Index Cond: (a.poly && coords)
"                    Filter: ((last_seen IS NOT NULL) AND enabled AND _st_covers(a.poly, coords) AND (_st_distance(a.coords, coords, '0'::double precision, true) < '40000'::double precision))"
                    Rows Removed by Filter: 7
Planning time: 26.210 ms
Execution time: 2.590 ms

And here's the new server results (there's no iowait but CPU burns at 100% the whole time if I set the limit higher)

Limit  (cost=5215.44..28517.08 rows=10 width=56) (actual time=124.928..484.664 rows=10 loops=1)
  ->  GroupAggregate  (cost=5215.44..80034722.29 rows=34345 width=56) (actual time=124.926..484.643 rows=10 loops=1)
        Group Key: a.id
        ->  Gather Merge  (cost=5215.44..80000159.03 rows=4551193 width=32) (actual time=89.883..559.222 rows=69 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Nested Loop  (cost=4215.41..79473838.41 rows=1896330 width=32) (actual time=38.777..468.633 rows=162 loops=3)
"                    ->  Parallel Index Scan using ""PK_5523204bb8469c2025bcb0b55bc"" on autocomplete a  (cost=0.42..92511.37 rows=14310 width=176) (actual time=0.017..0.159 rows=12 loops=3)"
"                          Filter: (type = 'city'::autocomplete_type_enum)"
                          Rows Removed by Filter: 84
                    ->  Bitmap Heap Scan on hotel b  (cost=4214.99..5547.18 rows=8 width=48) (actual time=37.861..37.959 rows=13 loops=37)
                          Recheck Cond: (last_seen IS NOT NULL)
                          Rows Removed by Index Recheck: 11
"                          Filter: (enabled AND st_covers(a.poly, coords) AND (st_distance(a.coords, coords, true) < '40000'::double precision))"
                          Rows Removed by Filter: 1
                          Heap Blocks: exact=107
                          ->  BitmapAnd  (cost=4214.99..4214.99 rows=26 width=0) (actual time=37.759..37.759 rows=0 loops=37)
                                ->  Bitmap Index Scan on hotel_coords_idx  (cost=0.00..2.60 rows=117 width=0) (actual time=0.910..0.910 rows=59 loops=37)
                                      Index Cond: (coords && a.poly)
                                ->  Bitmap Index Scan on hotel_last_seen_idx  (cost=0.00..4149.68 rows=258652 width=0) (actual time=36.268..36.268 rows=262908 loops=37)
Planning Time: 1.815 ms
Execution Time: 559.530 ms

Any idea why this would be the case? I'm at a loss. The new server actually beats aurora on a bunch of queries including a query such as "explain analyze select name from hotel where last_seen is not null and enabled = true group by name limit 1000;" but for some reason that query above just runs so much slower

Jacob Manu · Accepted Answer

Ok so I found out that PG12 was just really poorly optimizing that query no matter what I tried. When I rewrote it as such it worked just as fast.

WITH x AS
(
SELECT id, poly, coords
FROM autocomplete
WHERE type = 'city'
)
SELECT x.id, 
sum(CASE WHEN last_seen IS NOT NULL THEN 1 ELSE 0 END) AS hotel_count,
array_agg(CASE WHEN last_seen IS NOT NULL THEN b.id ELSE NULL END) AS hotel_ids
FROM x, hotel b
WHERE st_covers(x.poly, b.coords)
     AND st_distance(x.coords, b.coords) < 40000
     AND b.enabled = true
     GROUP BY x.id;

PS12 also was doing a full index scan on last_seen at every loop causing it to go forever so it was a LOT cheaper to remove the last_seen from where and just add it to the count condition for some reason

10x query time on PG 12 than 10

Answers (1)

Related Questions