Reputation: 8041
I have a table that stores 1M 384-dimensional vectors in the column called vector
of type "char"[]
. I am trying to speed up the following query by using parallelization:
EXPLAIN ANALYZE
WITH Vars(key) as (
VALUES (array_fill(1, ARRAY[384])::vector)
)
SELECT content_id
FROM MyTable, Vars
ORDER BY vector::int[]::vector <#> key
LIMIT 10;
The key
is just a toy vector consisting of all ones. <#>
is the dot product operator of the pgvector
extension, and vector
is the type defined by that extension, which to my understanding is similar to real[]
.
I am running this query in the free tier of AWS RDS. The Postgres instance has two vCPUs, so I expect to see an improvement from using two workers. Given the high dimensionality of the vectors, I expect computing the dot products to dominate the execution time and hence an improvement by almost a factor of 2 from using two workers.
To run without concurrency, I do:
set max_parallel_workers = 1;
set max_parallel_workers_per_gather = 1;
set enable_parallel_hash = off;
The output is:
Limit (cost=94267.29..94268.44 rows=10 width=12) (actual time=15624.961..15634.530 rows=10 loops=1)
-> Gather Merge (cost=94267.29..161913.85 rows=588231 width=12) (actual time=15624.958..15634.524 rows=10 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=93267.28..94737.86 rows=588231 width=12) (actual time=15607.302..15607.305 rows=7 loops=2)
Sort Key: ((((mytable.vector)::integer[])::vector <#> '[1,1,...,1]'::vector))
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Seq Scan on mytable (cost=0.00..80555.82 rows=588231 width=12) (actual time=0.413..15452.274 rows=500000 loops=2)
Planning Time: 10.502 ms
Execution Time: 15635.121 ms
(11 rows)
Next, we force two workers:
set force_parallel_mode = on;
set max_parallel_workers = 2;
set max_parallel_workers_per_gather = 2;
set enable_parallel_hash = on;
Here is the output:
Limit (cost=83268.20..83269.37 rows=10 width=12) (actual time=14369.219..14379.656 rows=10 loops=1)
-> Gather Merge (cost=83268.20..180496.59 rows=833328 width=12) (actual time=14369.217..14379.647 rows=10 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=82268.18..83309.84 rows=416664 width=12) (actual time=14352.711..14352.714 rows=9 loops=3)
Sort Key: ((((mytable.vector)::integer[])::vector <#> '[1,1,...,1]'::vector))
Sort Method: top-N heapsort Memory: 25kB
Worker 0: Sort Method: top-N heapsort Memory: 25kB
Worker 1: Sort Method: top-N heapsort Memory: 25kB
-> Parallel Seq Scan on mytable (cost=0.00..73264.22 rows=416664 width=12) (actual time=0.611..14204.459 rows=333333 loops=3)
Planning Time: 7.062 ms
Execution Time: 14380.487 ms
(12 rows)
The main question is: how come the expected time improvement is not observed? However, I would really appreciate a walk through the output of EXPLAIN ANALYZE
. In particular:
EXPLAIN ANALYZE
reveal about the reasons for not getting the expected time improvement?Upvotes: 0
Views: 494
Reputation: 247445
You have a basic misunderstanding: the parallel workers are started in addition to the backend process, which by default also does its share of the work. So if you set max_parallel_workers_per_gather = 1
, you don't disable parallel query, but limit it to one additional process. Set the parameter to 0 to disable parallel query.
With this, it is easier to answer your questions:
With two processes, each reads 500000 rows, and with three processes each reads 333333 rows.
The estimated row count for the sort and the gather nodes show how many rows PostgreSQL would estimate without the LIMIT
, while the actual row count stops short as soon as it has fetched enough rows to satisfy the LIMIT
. The discrepancy is surprising until you know that. However, PostgreSQL still considers the LIMIT
in its cost estimate.
I assume that the “dot products” are function calls. You can see them if you use EXPLAIN (ANALYZE, VERBOSE)
: then you see the function call in the output columns of the node where the function is evaluated. The function execution time is not listed separately, but is included in the execution time of the node.
Note that you can get function execution statistics from pg_stat_user_functions
if you have enabled track_functions
.
The twelve rows are the output of EXPLAIN
, not the output of the query.
You can see that the execution time of the parallel sequential scan does not become much shorter with a second parallel worker. Perhaps your disk is saturated. If you want to determine whether the time is spent in the CPU or doing I/O, enable track_io_timing
.
Upvotes: 2