Reputation: 31
Upgrading Postgres from 8.3.8 to 9.4.1 on new hardware. A representative set of queries shows that new system's performance ranges from 1x to 3x faster. However, one of our high-load areas is always slower.
EXPLAIN
output8.3.8:
Nested Loop (cost=25.78..709859.61 rows=1 width=4) (actual time=14.972..190.591 rows=32 loops=1)
-> Bitmap Heap Scan on prime p (cost=25.78..1626.92 rows=1066 width=4) (actual time=1.567..9.597 rows=10742 loops=1)
Recheck Cond: ((pid = ANY ('{28226,53915,83421,82118397,95513866}'::integer[])) AND (tid = ANY ('{1,2,3}'::integer[])))
Filter: (NOT deleted)
-> Bitmap Index Scan on FOO_IDX1 (cost=0.00..25.73 rows=1066 width=0) (actual time=1.144..1.144 rows=10742 loops=1)
Index Cond: ((pid = ANY ('{28226,53915,83421,82118397,95513866}'::integer[])) AND (deleted = false) AND (tid = ANY ('{1,2,3}'::integer[])))
-> Index Scan using FOO_IDX2 on data d (cost=0.00..663.88 rows=1 width=4) (actual time=0.017..0.017 rows=0 loops=10742)
Index Cond: (d.pid = p.pid)
Filter: (lower("substring"(d.value, 1, 1000)) ~~ '%something%'::text)
Total runtime: 190.639 ms
9.4.1:
Nested Loop (cost=1.15..335959.94 rows=1 width=4) (actual time=24.712..365.057 rows=32 loops=1)
-> Index Scan using FOO_IDX1 on prime p (cost=0.57..953.17 rows=1033 width=4) (actual time=0.048..13.884 rows=10741 loops=1)
Index Cond: ((pid = ANY ('{28226,53915,83421,82118397,95513866}'::integer[])) AND (deleted = false) AND (tid = ANY ('{1,2,3}'::integer[])))
Filter: (NOT deleted)
-> Index Scan using FOO_IDX2 on data d (cost=0.57..324.29 rows=1 width=4) (actual time=0.032..0.032 rows=0 loops=10741)
Index Cond: (pid = p.pid)
Filter: (lower("substring"(value, 1, 1000)) ~~ '%something%'::text)
Rows Removed by Filter: 11
Planning time: 0.940 ms
Execution time: 365.156 ms
…btree (pid);
…btree (lower("substring"(value, 1, 1000)) text_pattern_ops, fid);
…btree (lower("substring"(value, 1, 1000)), fid);
Varying the ranges of the following did not improve this case…
checkpoint_completion_target = 0.5
checkpoint_segments = 32
checkpoint_timeout = 30min
cpu_index_tuple_cost = 0.005
cpu_operator_cost = 0.0025
cpu_tuple_cost = 0.01
default_statistics_target = 500 (evaluated 100 to 10000 analyse after each)
effective_cache_size = 288GB
enable_seqscan = off
from_collapse_limit = 8
geqo = off
join_collapse_limit = 8
random_page_cost = 1.0
seq_page_cost = 1.0
shared_buffers = 96GB
work_mem = 64MB
We see a similar result for something%
too.
Before we drop this in for a few years I'd like to know if there is anything more I could do to optimise for these important cases.
SELECT p.pid
FROM prime p
INNER JOIN data d ON p.pid = d.pid
WHERE LOWER(substring(d.value,1,1000)) LIKE '%something%'
AND p.tid IN (1,2,3)
AND p.deleted = FALSE
AND p.ppid IN (28226, 53915, 83421, 82118397, 95513866)
Simplified and sanitized.
\d prime
Column | Type | Modifiers
---------------+-----------------------------+-------------------------------------------------
pid | integer | not null default nextval('prime_seq'::regclass)
deleted | boolean |
ppid | integer |
tid | integer |
\d data
Column | Type | Modifiers
----------------+---------+------------------------------------------------------
pdid | integer | not null default nextval('data_seq'::regclass)
pid | integer |
value | text |
I've tried a range of default_statistics_target.
default_statistics_target = 100 @ 381 ms
default_statistics_target = 500 @ 387 ms
default_statistics_target = 1000 @ 384 ms
default_statistics_target = 5000 @ 369 ms
(analyze and warm-up between test cycles)
This value can make a substantial difference in other areas of our application. 500 seems ideal, 5000+ caused other areas to slow down 3x to 10x.
Our kit is designed such that the entire database should always be in memory.
random_page_cost = 1.0 @ 372 ms
random_page_cost = 1.1 @ 372 ms
random_page_cost = 4.0 @ 370 ms
random_page_cost = 10.0 @ 369 ms
With enable_bitmapscan = off @ 362 ms (results in the same plan as expected)
Earlier I also tried enable_indexscan = off @ 491 ms (triggered different plan of course)
Yes, pg 8.3's plan uses an index and a bitmap index scan - which I think is the "nut" of this issue.
Thanks for the link to the related article.
The advice on column order is very interesting.
At our scale and growing, what would the optimal field order be for the following schemas?
What is the most effective way to reorganize the column order on a loaded table to realize the benefit?
prime having:
integer
text
boolean
boolean
integer
integer
smallint
integer
timestamp without time zone
timestamp without time zone
timestamp without time zone
text
data having:
integer
integer
integer
text
SELECT pid
FROM data d
JOIN prime p USING (pid)
WHERE LOWER(substring(d.value,1,1000)) LIKE '%something%'
AND p.ppid IN (28226, 53915, 83421, 82118397, 95513866)
AND p.tid IN (1, 2, 3)
AND p.deleted = FALSE;
lower(substring(d.value,1,1000)) = 355 ms
lower(left(d.value,1000)) = 343 ms (~3% faster over multiple tests, I'll take that!)
To handle the unanchored case we have a second index using the operator class "text_pattern_ops".
We've evaluated multi-column GIN indexes before, but didn't realize the expected benefits. Complex because A) multiple criteria to meet on acl, status, and similar, B) need to hit "exact phrases" which require a result phrase recheck. I'm optimistic on using a full-text approach long term, the recipes we've tried so far are not faster or more stable than the old-school BTREE approach; yet.
GIN trial 1
CREATE EXTENSION btree_gin
CREATE INDEX FOO_IDX3 ON data USING GIN (to_tsvector('simple', lower(left(value, 1000))), pid)
ANALYSE data
SELECT p.pid
FROM prime p
INNER JOIN data d ON p.pid = d.pid
WHERE to_tsvector('simple', lower(left(d.value, 1000))) @@ to_tsquery('simple', 'something')
AND p.tid IN (1,2,3)
AND p.deleted = FALSE
AND p.ppid IN (28226, 53915, 83421, 82118397, 95513866)
Execution time: 1034.866 ms (without phrase recheck)
GIN trial 2
CREATE EXTENSION pg_trgm
CREATE INDEX FOO_IDX4 ON data USING gin (left(value,1000) gin_trgm_ops, pid);
ANALYSE data
SELECT p.pid
FROM prime p
INNER JOIN data d ON p.pid = d.pid
WHERE left(d.value,1000) LIKE '%Something%'
AND p.tid IN (1,2,3)
AND p.deleted = FALSE
AND p.ppid IN (28226, 53915, 83421, 82118397, 95513866)
Hash Join (cost=2870.42..29050.89 rows=1 width=4) (actual time=668.333..2262.101 rows=32 loops=1)
Hash Cond: (d.pid = p.pid)
-> Bitmap Heap Scan on data d (cost=230.30..26250.04 rows=25716 width=4) (actual time=653.130..2234.736 rows=38659 loops=1)
Recheck Cond: ("left"(value, 1000) ~~ '%Something%'::text)
Rows Removed by Index Recheck: 146677
Heap Blocks: exact=161810
-> Bitmap Index Scan on FOO_IDX4 (cost=0.00..223.87 rows=25716 width=0) (actual time=575.442..575.442 rows=185336 loops=1)
Index Cond: ("left"(value, 1000) ~~ '%Something%'::text)
-> Hash (cost=2604.33..2604.33 rows=2863 width=4) (actual time=15.158..15.158 rows=10741 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 378kB
-> Index Scan using FOO_IDX4 on prime p (cost=0.57..2604.33 rows=2863 width=4) (actual time=0.064..11.737 rows=10741 loops=1)
Index Cond: ((ppid = ANY ('{28226,53915,83421,82118397,95513866}'::integer[])) AND (deleted = false) AND (tid = ANY ('{1,2,3}'::integer[])))
Filter: (NOT deleted)
Planning time: 1.861 ms
Execution time: 2262.210 ms
We already have an index on prime w/ "ppid, deleted, tid", sorry this was not clear originally.
Upvotes: 3
Views: 486
Reputation: 656706
The most common reasons for bad query plans are column statistics or cost settings not representing reality well:
A setting of random_page_cost = 1.0
only makes sense if random access is actually as fast as sequential access, which is only true if your DB resides in RAM entirely. A database with tables of 80M and 750M rows is probably too big for that. If my assumption is right, raising that cost setting a bit might fix the problem. Try at least 1.1
, probably more. Run tests to find the sweet spot in your setup.
Normally I would first run:
SET enable_bitmapscan = off;
in the current session in pg 9.4 and then test again. The tricky part is that your query may need both: index- and a bitmap index scan. I would need to see the query.
Extremely low settings for random_page_cost
favors index scans over bitmap index scans. If that cost setting is misleading, you get inferior query plans.
Answers to this related question on dba.SE have more explanation:
Your table design is simplified, but generally it's better not to place a boolen column *between integer columns, because you waste disk space to padding this way. Better:
pid | integer | not null default nextval('prime_seq'::regclass)
tid | integer |
deleted | boolean |
This is just a minor improvement, but it has no downside.
Can be improved in multiple ways:
SELECT pid
FROM data d
JOIN prime p USING (pid)
WHERE left(d.value,1000) LIKE '%something%'
AND p.pid IN (28226, 53915, 83421, 82118397, 95513866)
AND p.tid IN (1, 2, 3)
AND p.deleted = false;
left(d.value,1000)
is shorter and faster than substring(d.value,1,1000)
(needs pg 9.1+).
A text_pattern_ops
index is only useful for left-anchored pattern matching with LIKE
. Your expression is not anchored. (I have seen you are using an anchored pattern, too.) For this, use a trigram GIN index, provided by the additional module pg_trgm
that's hugely faster with big tables, especially in pg 9.4 (improved GIN indexes).
To include the integer
column pid
in the GIN index below, first install the additional module btree_gin
, which provides the necessary GIN operator classes. Run once per database:
CREATE EXTENSION btree_gin;
Making some assumptions, this would be perfect for your query. A multicolumn trigram GIN index on data
:
CREATE INDEX data_value_gin_trgm_idx ON data
USING gin (left(value,1000) gin_trgm_ops, pid);
And a partial multicolumn index on prime
:
CREATE INDEX prime_pid_tip_idx ON prime (pid, tip)
WHERE deleted = false;
Talking order of magnitude here.
Upvotes: 2
Reputation: 31
A slightly different approach to first produce a maximum possible result set from the "data table" via a Common Table Expression (CTE), then back onto prime for refinement by acl, status etc, reduced time from 365 msec to 142 msec (saving 223 msec). This technique appears to be faster than the 8.3 baseline.
WITH d as (SELECT pid
FROM data
WHERE LOWER(left(value,1000)) LIKE '%something%'
AND fid IN (nnn,nnn,...))
SELECT p.pid FROM d INNER JOIN prime p on p.pid = d.pid
WHERE p.tid IN (1,2,3)
AND p.deleted = FALSE
AND p.ppid IN (28226,53915,83421,82118397,95513866)
Planning time: 1.417 ms
Execution time: 141.508 ms
I'll further evaluate CTE for unexpected implications.
Upvotes: 0