Reputation: 368
My query (simple with a few joins) runs extremely slow when I have a small amount of data (~50k rows) but runs fast when I have a bigger amount of data (~180k rows). The time difference is huge since it is from a few seconds to almost half an hour.
I have re-checked the joins and they are all correct. In addition, I have run a VACUUM ANALYZE
to the table before running the query but it didn't solve anything. I also checked whether there were locks that were blocking the query in any way or the connectivity was in any case slow but they are not the case of the faults.
Therefore, I went to check the output of the EXPLAIN
. After reading the outcome, I see that in the slow case, it makes unnecessary extra sortings and it gets stuck in a nested for loop that is non existent in the case where I have way more data. I'm not sure how to tell postgres to do the same plan as with the bigger dataset scenario.
Based on a comment, I also tried not to use CTEs but it is not helping either: still makes the nested loops and the sortings.
PostgreSQL 12.3
WITH t0 AS (SELECT * FROM original_table WHERE id=0),
t1 AS (SELECT * FROM original_table WHERE id=1),
t2 AS (SELECT * FROM original_table WHERE id=2),
t3 AS (SELECT * FROM original_table WHERE id=3),
t4 AS (SELECT * FROM original_table WHERE id=4)
SELECT
t0.dtime,
t1.dtime,
t3.dtime,
t3.dtime::date,
t4.dtime,
t1.first_id,
t1.field,
t1.second_id,
t1.third_id,
t2.fourth_id,
t4.fourth_id
FROM t1
LEFT JOIN t0 ON t1.first_id=t0.first_id
JOIN t2 ON t1.first_id=t2.first_id AND t1.second_id = t2.second_id AND t1.third_id = t2.third_id
JOIN t3 ON t1.first_id=t3.first_id AND t1.second_id = t3.second_id AND t1.third_id = t3.third_id
JOIN t4 ON t1.first_id=t4.first_id AND t1.second_id = t4.second_id AND t1.fourth_id= t4.third_id
ORDER BY t3.dtime
;
Column | Type
----------+----------------------------
id | smallint
dtime | timestamp without time zone
first_id | character varying(10)
second_id | character varying(10)
third_id | character varying(10)
fourth_id | character varying(10)
field | character varying(10)
EXPLAIN (BUFFERS, ANALYZE)
for the two cases - slow case https://explain.depesz.com/s/5JDw, fast case: https://explain.depesz.com/s/JMIL name | current_setting | source
---------------+-----------------+---------------------
max_stack_dept | 2MB | environment variable
max_wal_size | 1GB | configuration file
min_wal_size | 80MB | configuration file
shared_buffers | 128MB | configuration file
Upvotes: 1
Views: 172
Reputation: 82146
This often happens to me on SQL-Server.
What usually causes the slowness, is that it executes the CTE once per row joined.
You can prevent that from happening by selecting into temp-tables, instead of using CTEs.
I assume the same is true for PostgreSQL, but I didn't test it:
DROP TABLE IF EXISTS tempT0;
DROP TABLE IF EXISTS tempT1;
CREATE TEMP TABLE tempT0 AS SELECT * FROM original_table WHERE id=0;
CREATE TEMP TABLE tempT1 AS SELECT * FROM original_table WHERE id=1;
[... etc]
FROM tempT1 AS t1
LEFT JOIN tempT0 AS t0 ON t1.first_id=t0.first_id
Upvotes: 1