jl6
jl6

Reputation: 6394

Why does adding a window function make this query so slow?

Query A executes in microseconds:

SELECT t1.id
 FROM (SELECT t0.id AS id FROM t0) AS t1
 WHERE NOT (EXISTS (SELECT 1
        FROM t2
        WHERE t2.ph_id = t1.id AND t2.me_id = 1 AND t2.rt_id = 4))
 LIMIT 20 OFFSET 0

But query B takes about 25 seconds:

SELECT t1.id, count(*) OVER () AS count
 FROM (SELECT t0.id AS id FROM t0) AS t1
 WHERE NOT (EXISTS (SELECT 1
        FROM t2
        WHERE t2.ph_id = t1.id AND t2.me_id = 1 AND t2.rt_id = 4))
 LIMIT 20 OFFSET 0

(the difference is just one item in the select clause - a window aggregate)

EXPLAIN output is as follows, for A:

 Limit  (cost=0.00..1.20 rows=20 width=4)
   ->  Nested Loop Anti Join  (cost=0.00..3449.22 rows=57287 width=4)
         Join Filter: (t2.ph_id = t0.id)
         ->  Seq Scan on t0  (cost=0.00..1323.88 rows=57288 width=4)
         ->  Materialize  (cost=0.00..1266.02 rows=1 width=4)
               ->  Seq Scan on t2  (cost=0.00..1266.01 rows=1 width=4)
                     Filter: ((me_id = 1) AND (rt_id = 4))

And for B:

 Limit  (cost=0.00..1.45 rows=20 width=4)
   ->  WindowAgg  (cost=0.00..4165.31 rows=57287 width=4)
         ->  Nested Loop Anti Join  (cost=0.00..3449.22 rows=57287 width=4)
               Join Filter: (t2.ph_id = t0.id)
               ->  Seq Scan on t0  (cost=0.00..1323.88 rows=57288 width=4)
               ->  Materialize  (cost=0.00..1266.02 rows=1 width=4)
                     ->  Seq Scan on t2  (cost=0.00..1266.01 rows=1 width=4)
                           Filter: ((me_id = 1) AND (rt_id = 4))

I am adding the window aggregate to get the total number of rows before LIMITing, for the purpose of building a paging UI.

Upvotes: 0

Views: 4141

Answers (3)

jl6
jl6

Reputation: 6394

Thank you for the other answers, which are correct in that the counting has to do more work, but I found the solution from another source. The statistics were not up to date.

After running the command...:

ANALYZE;

... Postgresql was able to choose a more appropriate query plan, and now both queries run very fast.

Upvotes: 1

Bulat
Bulat

Reputation: 6969

Can you check how long COUN(*) takes and what execution plan looks like:

SELECT count(*)
FROM (SELECT t0.id AS id FROM t0) AS t1
WHERE NOT (EXISTS (SELECT 1
        FROM t2
        WHERE t2.ph_id = t1.id AND t2.me_id = 1 AND t2.rt_id = 4))

This might give you an idea why it takes longer.

Basically first query only reads 20 first records that match the criteria from t0, while second query has to generate the full set of records matching the criteria to count them.

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1269873

Your original query can be written like this:

SELECT t0.id
FROM t0
WHERE NOT EXISTS (SELECT 1
                  FROM t2
                  WHERE t2.ph_id = t1.id AND t2.me_id = 1 AND t2.rt_id = 4
                 )
LIMIT 20 OFFSET 0;

You have no order by, so the query can start returning results as they are found for the result set. When you add the window function:

SELECT t.0.id, count(*) over ()

Now it is counting the number of rows in the result set, so it has to generate the entire result set. Hence, instead of just getting the first twenty rows, the query has to generate all of them. That takes more time.

Upvotes: 3

Related Questions