Reputation: 370
I have two database servers that are both in "test" mode with one expected to be promoted to the production server. As such, the specs differ and some of the configs, but we are finding that the underpowered server produces better query plans and thus, faster queries.
Stats:
Both systems have roughly the same data and the amount looks like this:
Size | Part
--------------------
1.47 TB | Entire DB
871 GB | Tables
635 GB | Indexes
The bigger db server has the following specs:
RAM: 500 GB
CPU: 16 Cores 2.0 GHz Intel
Using SSDs
Postgres 10.0
Memlock set to reserve 485 GB specifically to Postgres
Postgres Settings:
shared_buffers
: 125 GB
work_mem
: 36 MB
effective_cache_size
: 300 GB
random_page_cost
: 1
default_statistics_target
: 1000
Query Plan: https://explain.depesz.com/s/9Ww6
The smaller server has the following stats:
RAM: 281 GB
CPU: 4 Cores 2.0 GHz Intel
Using SSDs
Postgres 10.0
Memlock set to reserve 240 GB specifically to Postgres
Postgres Settings:
shared_buffers
: 50 GB
work_mem
: 25.6 MB
effective_cache_size
: 150 GB
random_page_cost
: 4
default_statistics_target
: 100
Query Plan: https://explain.depesz.com/s/4WUH
We've tried switching random_page_cost, default statistics (followed by analyze) and work memory to match each other. The biggest gain came after running a vacuum full
on all tables in the query.
Workload: This machine is a read replica used to extract files of the data as XML files, etc. So it receives replicated data and has a fairly heavy read load.
Question: What should I be looking for to make this query more performant on the larger server where it runs slower? Ideally this query runs much faster than it does on the smaller server. It appears as we have scaled we have failed to correctly set the settings to take advantage of our hardware. There must be something I'm overlooking.
Edit: I put the non-obfuscated plans up. I have also tried bumping statistics up to 3000 from 1000 and it didn't help the plans. Same goes for changing the random_page_cost to match between the servers.
Upvotes: 1
Views: 250
Reputation: 246093
The PostgreSQL configuration on the two machines is quite different, so it is not surprising that the query plans are different. Particularly random_page_cost
has a big impact.
You definitely should benchmark your workload with different shared_buffers
settings: yours is probably way too high (normally, 8GB is appropriate).
But I think that both your query plans are terrible, and your problem is in a different spot.
The optimizer misestimates the amount of rows returned from an index scan on showtimes.mappable_program
, and this misestimate causes even worse misestimates and bad plan choices later on.
Try increasing the density of the statistics on the column:
ALTER TABLE showtimes.mappable_program ALTER mapping_scheme_id
SET STATISTICS 1000;
Then ANALYZE
the table.
If that doesn't do the trick, modify the query by replacing
WHERE COALESCE(mp2.ignore::integer, 0) = 0
with
WHERE mp2.ignore = '0' OR mp2.ignore IS NULL
That might help the optimizer estimate the condition better.
Upvotes: 1