Understanding execution plan and parallelization in PostgreSQL

Question

I have a table that stores 1M 384-dimensional vectors in the column called vector of type "char"[]. I am trying to speed up the following query by using parallelization:

EXPLAIN ANALYZE
WITH Vars(key) as (
    VALUES (array_fill(1, ARRAY[384])::vector)
)
SELECT content_id
FROM MyTable, Vars
ORDER BY vector::int[]::vector <#> key
LIMIT 10;

The key is just a toy vector consisting of all ones. <#> is the dot product operator of the pgvector extension, and vector is the type defined by that extension, which to my understanding is similar to real[].

I am running this query in the free tier of AWS RDS. The Postgres instance has two vCPUs, so I expect to see an improvement from using two workers. Given the high dimensionality of the vectors, I expect computing the dot products to dominate the execution time and hence an improvement by almost a factor of 2 from using two workers.

To run without concurrency, I do:

set max_parallel_workers = 1;
set max_parallel_workers_per_gather = 1;
set enable_parallel_hash = off;

The output is:

 Limit  (cost=94267.29..94268.44 rows=10 width=12) (actual time=15624.961..15634.530 rows=10 loops=1)
   ->  Gather Merge  (cost=94267.29..161913.85 rows=588231 width=12) (actual time=15624.958..15634.524 rows=10 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Sort  (cost=93267.28..94737.86 rows=588231 width=12) (actual time=15607.302..15607.305 rows=7 loops=2)
               Sort Key: ((((mytable.vector)::integer[])::vector <#> '[1,1,...,1]'::vector))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on mytable  (cost=0.00..80555.82 rows=588231 width=12) (actual time=0.413..15452.274 rows=500000 loops=2)
 Planning Time: 10.502 ms
 Execution Time: 15635.121 ms
(11 rows)

Next, we force two workers:

set force_parallel_mode = on;
set max_parallel_workers = 2;
set max_parallel_workers_per_gather = 2;
set enable_parallel_hash = on;

Here is the output:

 Limit  (cost=83268.20..83269.37 rows=10 width=12) (actual time=14369.219..14379.656 rows=10 loops=1)
   ->  Gather Merge  (cost=83268.20..180496.59 rows=833328 width=12) (actual time=14369.217..14379.647 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=82268.18..83309.84 rows=416664 width=12) (actual time=14352.711..14352.714 rows=9 loops=3)
               Sort Key: ((((mytable.vector)::integer[])::vector <#> '[1,1,...,1]'::vector))
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Seq Scan on mytable  (cost=0.00..73264.22 rows=416664 width=12) (actual time=0.611..14204.459 rows=333333 loops=3)
 Planning Time: 7.062 ms
 Execution Time: 14380.487 ms
(12 rows)

The main question is: how come the expected time improvement is not observed? However, I would really appreciate a walk through the output of EXPLAIN ANALYZE. In particular:

The numbers of rows shown do not seem to reflect the actual number of rows in the table (i.e. exactly 1M).
Where can I see the time measurements/estimates related to computation of the dot products?
How come it says 12 rows in the second output? (I believe it should be 11 rows: one for the titles and ten for the data)
And most importantly, what do the outputs of EXPLAIN ANALYZE reveal about the reasons for not getting the expected time improvement?

Understanding execution plan and parallelization in PostgreSQL

Answers (1)

Related Questions