Reputation:
For those interested in the explains
Without WHERE: https://explain.depesz.com/s/XaDH
With WHERE: https://explain.depesz.com/s/iPOL
I have a view which I can select * from and receive 500+ rows in 519 ms, if I add WHERE ordernumber = 165973 (or any order number) it takes 18 seconds. The query planner is also convinced both of these queries take ~18 seconds, but without a where condition it completes in ~500ms every time.
CREATE VIEW v_recent_api_panels AS
SELECT DISTINCT ON (s.barcode) s.barcode,
s.ordernumber,
production.machine_data_box(d.datastring) AS box,
c."Edate",
c."SDate",
p.height,
p.width,
p.blankheight,
p.blankwidth,
p.paneltype,
p.dateprocessed,
p.listnum,
p.material,
p.color,
p.boxnumber,
p.location,
p.cutout_param,
s.lastmodified,
p.machinetype,
'NULL'::text AS status,
(((((date_part('year'::text, p.danobatfab) || '-'::text) || date_part('month'::text, p.danobatfab)) || '-'::text) || date_part('day'::text, p.danobatfab)))::date AS danobatfab,
p.fabdate AS originalfabdate
FROM (((production.mv_recent_schedules s
LEFT JOIN workorders.mv_recent_panel_details p ON ((p.barcode = s.barcode)))
LEFT JOIN cupra.mv_parts c ON (((c."PrdRef")::text = s.barcode)))
LEFT JOIN production.mv_machine_data d ON ((d.barcode = p.barcode)))
WHERE (p.ordernumber IS NOT NULL)
ORDER BY s.barcode;
EDIT: My postgres client (datagrip) was adding an implicit LIMIT 500
to the query, which was only used effectively without the WHERE
statement.
Changed the title of this question to reflect my discovery.
EDIT 2: I don't know if I should leave this question up or delete it because it's rather silly, but the answer is that the select * statement LIMIT 500
would only have to evaluate the first 500 rows it happened upon, but the WHERE
forced it to evaluate all of them. My sql cient was adding the limit and I didn't consider that when I asked the question.
Upvotes: 0
Views: 269
Reputation: 246598
There must be a mistake: the fast execution plan shows an execution time of 18 seconds. It also returns 68489 rows and not 500.
The problem is your function production.machine_data_box
; it is responsible for almost all the execution time.
The problem is that the function is always evaluated for all 271651 rows, because the WHERE condition can only be applied after the query of the view is completed, because you use DISTINCT
.
I have several ideas for improvements:
Remove the ORDER BY
. It should be in the query using the view.
Don't put the DISTINCT ON
into the view. Then you can put it into your query after your extra condition has been evaluated, if that is possible.
Mark the function as expensive with COST = 1000
or higher. Then the optimizer will prefer plans that call the function less often. This probably won't help unless you get rid of the DISTINCT ON
, but it certainly is the right thing to do.
If you can, rewrite the function so that it is faster.
Upgrade to PostgreSQL 9.6 or better. There has been an improvement in 9.6 that defers evaluation of functions until after sorting. Not sure if that would help.
Upvotes: 1