Reputation: 1522
I have 3 tables: requests, steps, and payloads. Each requests has N steps (so one-to-many), and each step has 1 payload (one-to-one).
Now, whenever I want to filter by payload body, execution time is terrible.
Here's the simplified request:
select rh.id
from request_history_step_payload rhsp
join request_history_step rhs on rhs.id = rhsp.step_id
join request_history rh on rhs.request_id = rh.id
where rh.id> 35000 and rhs.step_type = 'CONSUMER_REQUEST' and rhsp.payload like '%09141%'
and here's it's EXPLAIN ANALYZE
(run immediately after VACUUM ANALYZE
):
Nested Loop (cost=0.71..50234.28 rows=1 width=8) (actual time=120.093..2494.929 rows=12 loops=1)
-> Nested Loop (cost=0.42..50233.32 rows=3 width=8) (actual time=120.083..2494.900 rows=14 loops=1)
-> Seq Scan on request_history_step_payload rhsp (cost=0.00..50098.28 rows=16 width=8) (actual time=120.063..2494.800 rows=25 loops=1)
Filter: ((payload)::text ~~ '%09141%'::text)
Rows Removed by Filter: 164512
-> Index Scan using request_history_step_pkey on request_history_step rhs (cost=0.42..8.44 rows=1 width=16) (actual time=0.003..0.003 rows=1 loops=25)
Index Cond: (id = rhsp.step_id)
Filter: ((step_type)::text = 'CONSUMER_REQUEST'::text)
Rows Removed by Filter: 0
-> Index Only Scan using request_history_pkey on request_history rh (cost=0.29..0.32 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=14)
Index Cond: ((id = rhs.request_id) AND (id > 35000))
Heap Fetches: 0
Planning Time: 0.711 ms
Execution Time: 2494.964 ms
Now, I would want to somehow suggest planner to apply LIKE
operation as the LAST leg, but can't think of any way of doing so. I've tried all kinds of joins, and re-ordering them, and moving conditions to and from the ON
clauses to WHERE
clause and vice versa. All to no avail! Anyhow, it's looking first of all into ALL rows in the payload
table, which obviously is the worst possible idea, given other conditions, that could drastically reduce the number of LIKE
operations that need to be applied. So, I'd want to expect that it would apply the id
condition first, which already sorts out like 90% of all records; then it would apply the step_type
condition which would sort out like 85% of the rest; and thus would only apply LIKE
condition to less than 5% of all payloads.
How would I go about it? I am using Postgres 11.
UPD: I was suggested to add indexes information for these tables, so:
request_history
- has 2 unique indexes on id
field (I have no idea why are there 2 of them)request_history_step
- has 2 unique indexes, both on id
fieldrequest_history_step_payload
- has 1 unique index on id
fieldUPD2: step
and payload
tables also have FKs defined (on payload.step_id->step.id and step.request_id -> request_id respectively)
I've also tried several (simplified) queries w/sub-SELECT:
explain analyze select rhs.id from request_history_step rhs
join (select step_id from request_history_step_payload rhsp where rhsp.payload like '%09141%') rhsp on rhs.id = rhsp.step_id
where rhs.step_type = 'CONSUMER_REQUEST';
explain analyze select rhsp.step_id from request_history_step_payload rhsp
join (select id from request_history_step rhs where rhs.step_type = 'CONSUMER_REQUEST') rhs on rhs.id = rhsp.step_id
where rhsp.payload like '%09141%';
explain analyze select rhsp.step_id from request_history_step_payload rhsp
where rhsp.step_id in (select id from request_history_step rhs where rhs.step_type = 'CONSUMER_REQUEST')
and rhsp.payload like '%09141%';
(also with JOIN LATERAL
instead of just JOIN
) - each of them gives the exact same plan, i.e. nested loop, in it nested loop, and the "outer" (first) leg in that loop is a SeqScan. which .. drives me nuts. Why would it want to execute the most expensive operation against widest possible set of rows??
UPD3: inspired by the comments under original question, I've run some further experiments. I settled with simpler query:
select rhs.request_id
from request_history_step_payload rhsp
join request_history_step rhs on rhs.id = rhsp.step_id
where rhs.step_type = 'CONSUMER_REQUEST' and rhsp.payload like '%09141%';
and now, it's execution plan was essentially as for the original one, just one "Nested Loop" less.
Now, I've added an index for payload.step_id
:
create index request_history_step_payload_step_id on request_history_step_payload(step_id);
Run VACUUM ANALYZE
; run the query with explain analyze
- nothing changes. Hmmm.
Now I've run set enable_seqscan to off
. And NOW WE'RE TALKING:
Gather (cost=1000.84..88333.90 rows=3 width=8) (actual time=530.273..589.650 rows=14 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (cost=0.84..87333.60 rows=2 width=8) (actual time=544.639..580.608 rows=7 loops=2)
-> Parallel Index Scan using request_history_step_pkey on request_history_step rhs (cost=0.42..15913.04 rows=20867 width=16) (actual time=0.029..28.667 rows=17620 loops=2)
Filter: ((step_type)::text = 'CONSUMER_REQUEST'::text)
Rows Removed by Filter: 64686
-> Index Scan using request_history_step_payload_step_id on request_history_step_payload rhsp (cost=0.42..3.41 rows=1 width=8) (actual time=0.031..0.031 rows=0 loops=35239)
Index Cond: (step_id = rhs.id)
Filter: ((payload)::text ~~ '%09141%'::text)
Rows Removed by Filter: 1
Planning Time: 0.655 ms
Execution Time: 589.688 ms
Now, with costs of SeqScan going thru the roof, I think we can see the gist of the problem: the costs of this execution plan are considered higher than for the original one (88k vs 50k), even though the execution time is actually far shorter (590ms vs 2700ms). And this apparently is the reason why Postgre planner keeps choosing "SeqScan first" despite of all my efforts to convince him otherwise.
I've also tried adding indexes for the step.step_type
field; both hash
and btree
-based. Each of those would still generate a plan with costs way more than 50k, so with enable_seqscan
set to on
(the default), the planner would always ignore those.
Does anyone know any mitigation for this? I am afraid that proper solution might require changing planner variables weight, which I am not inclined to do, of course. But would be glad to hear any suggestions!
UPD4: now that I've played some more, I can report further results (with enable_seqscan
set to on
:
This one is slow, applies seqscan, even though there's index on step.step_type
:
explain analyze
select rhsp.step_id
from (select request_id, id from request_history_step rhs2 where rhs2.step_type = 'CONSUMER_REQUEST') rhs
join request_history_step_payload rhsp on rhs.id = rhsp.step_id
where rhsp.payload like '%09141%';
this one is based on suggestion by O. Jones, is still slow:
explain analyze
with rhs as (select request_id, id from request_history_step rhs2 where rhs2.step_type = 'CONSUMER_REQUEST')
select rhsp.step_id from request_history_step_payload rhsp
join rhs on rhs.id = rhsp.step_id
where rhsp.payload like '%09141%';
but this one, slightly modified, is fast:
explain analyze
with rhs as (select request_id, id from request_history_step rhs2 where rhs2.step_type = 'CONSUMER_REQUEST')
select rhsp.step_id
from request_history_step_payload rhsp
join rhs on rhs.id = rhsp.step_id
where rhsp.step_id in (select id from rhs) and rhsp.payload like '%09141%';
it's execution plan is:
Hash Join (cost=9259.55..10097.04 rows=2 width=8) (actual time=1157.984..1162.199 rows=14 loops=1)
Hash Cond: (rhs.id = rhsp.step_id)
CTE rhs
-> Bitmap Heap Scan on request_history_step rhs2 (cost=1169.28..6918.06 rows=35262 width=16) (actual time=3.899..19.093 rows=35241 loops=1)
Recheck Cond: ((step_type)::text = 'CONSUMER_REQUEST'::text)
Heap Blocks: exact=3120
-> Bitmap Index Scan on request_history_step_step_type_hash (cost=0.00..1160.46 rows=35262 width=0) (actual time=3.047..3.047 rows=35241 loops=1)
Index Cond: ((step_type)::text = 'CONSUMER_REQUEST'::text)
-> CTE Scan on rhs (cost=0.00..705.24 rows=35262 width=8) (actual time=3.903..5.976 rows=35241 loops=1)
-> Hash (cost=2341.39..2341.39 rows=8 width=16) (actual time=1153.976..1153.976 rows=14 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Nested Loop (cost=793.81..2341.39 rows=8 width=16) (actual time=104.170..1153.919 rows=14 loops=1)
-> HashAggregate (cost=793.39..795.39 rows=200 width=8) (actual time=33.315..44.875 rows=35241 loops=1)
Group Key: rhs_1.id
-> CTE Scan on rhs rhs_1 (cost=0.00..705.24 rows=35262 width=8) (actual time=0.001..23.590 rows=35241 loops=1)
-> Index Scan using request_history_step_payload_step_id on request_history_step_payload rhsp (cost=0.42..7.72 rows=1 width=8) (actual time=0.031..0.031 rows=0 loops=35241)
Index Cond: (step_id = rhs_1.id)
Filter: ((payload)::text ~~ '%09141%'::text)
Rows Removed by Filter: 1
Planning Time: 1.318 ms
Execution Time: 1162.618 ms
Again, costs are down dramatically, while execution time not so much
Upvotes: 3
Views: 677
Reputation: 108641
From your plan it looks like your step_type
predicate on your request_history_step_payload
operation isn't helping much.
So let's try a text (trigram) index with covering columns INCLUDEd to help that search step work faster.
CREATE INDEX CONCURRENTLY rhsp_type_payload
ON request_history_step_payload
USING GIN (step_type gin_trgm_ops)
INCLUDE (rhs_step_type, rhs_step_id);
It's possible it will help. Give it a try.
When you have that index you might also try reworking your query like this:
select rh.id
from ( select step_id
from request_history_step_payload
where rhs.step_type = 'CONSUMER_REQUEST'
and rhsp.payload like '%09141%'
) rhsp
join request_history_step rhs on rhs.id = rhsp.step_id
join request_history rh on rhs.request_id = rh.id
where rh.id> 35000
This moves your search of that request_history_step_payload
table to a subquery. You can work on optimizing the subquery separately as you try to get this all working fast enough for your application.
And, drop any duplicate indexes. They slow down INSERT and UPDATE operations for no good reason.
Upvotes: 1