gcbenison
gcbenison

Reputation: 11963

How can I get a good query plan without selecting partitions by a literal value?

I have a large table, foos, partitioned by foo_type. The following yields a good query plan (just one partition selected):

select count(*) from foos where foo_type=1;

But if I try to change the literal "1" to an (equivalent) subquery - I end up with a plan that scans every partition -

select count(*) from foos where foo_type=(select min(foo_type) from favorite_foo_types);

How can I write a query that uses a subselect in a 'where' clause such as that and doesn't end up scanning every partition?

Upvotes: 0

Views: 105

Answers (1)

Jon Roberts
Jon Roberts

Reputation: 2106

You didn't provide code so no one answered the question. The short answer, dynamic partition elimination works in Greenplum but the explain is different from a plan where the literal value is provided.

Example:

First create your favorite_foo_types table.

create table public.favorite_foo_types
(id int, foo_type int)
distributed by (id);

insert into public.favorite_foo_types 
values (1, 1), (2,2), (3,3), (4,4), (5,5);

analyze public.favorite_foo_types;

Next, create your partitioned table.

create table public.foos 
(id int, foo_type int)
distributed by (id)
partition by list (foo_type)
(
partition foo_1 values (1),
partition foo_2 values (2),
partition foo_3 values (3),
partition foo_4 values (4),
partition foo_5 values (5)
);

insert into public.foos 
select i as id, case when i between 1 and 1999 then 1 
when i between 2000 and 3999 then 2 
when i between 4000 and 5999 then 3 
when i between 6000 and 7999 then 4 
when i between 8000 and 9999 then 5 end as foo_type
from generate_series(1,9999) as i;

analyze public.foos;

Here is the plan when using a literal value. You can see it is picking just one partition too.

explain analyze 
select count(*) 
from public.foos 
where foo_type = 1;

Aggregate  (cost=0.00..431.07 rows=1 width=8)
  Rows out:  1 rows with 0.722 ms to first row, 0.723 ms to end, start offset by 0.298 ms.
  ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..431.07 rows=1 width=8)
    Rows out:  2 rows at destination with 0.717 ms to first row, 0.718 ms to end, start offset by 0.299 ms.
    ->  Aggregate  (cost=0.00..431.07 rows=1 width=8)
          Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg0) with 0.287 ms to end, start offset by 0.663 ms.
          ->  Sequence  (cost=0.00..431.07 rows=1000 width=4)
            Rows out:  Avg 999.5 rows x 2 workers.  Max 1000 rows (seg0) with 0.036 ms to first row, 0.215 ms to end, start offset by 0.663 ms.
            ->  Partition Selector for foos (dynamic scan id: 1)  (cost=10.00..100.00 rows=50 width=4)
              Filter: foo_type = 1
              Partitions selected:  1 (out of 5)
              Rows out:  0 rows (seg0) with 0.004 ms to end, start offset by 0.663 ms.
            ->  Dynamic Table Scan on foos (dynamic scan id: 1)  (cost=0.00..431.07 rows=1000 width=4)
              Filter: foo_type = 1
              Rows out:  Avg 999.5 rows x 2 workers.  Max 1000 rows (seg0) with 0.032 ms to first row, 0.140 ms to end, start offset by 0.667 ms.
              Partitions scanned:  Avg 1.0 (out of 5) x 2 workers.  Max 1 parts (seg0).
Slice statistics:
  (slice0)    Executor memory: 408K bytes.
  (slice1)    Executor memory: 195K bytes avg x 2 workers, 195K bytes max (seg0).
Statement statistics:
  Memory used: 128000K bytes
Settings:  optimizer=on
Optimizer status: PQO version 1.650
Total runtime: 1.162 ms

Now, your query:

explain analyze 
select count(*) 
from public.foos 
where foo_type=(select min(foo_type) from public.favorite_foo_types);

Aggregate  (cost=0.00..863.04 rows=1 width=8)
  Rows out:  1 rows with 6.466 ms to end, start offset by 24 ms.
  ->  Gather Motion 2:1  (slice3; segments: 2)  (cost=0.00..863.04 rows=1 width=8)
    Rows out:  2 rows at destination with 5.415 ms to first row, 6.459 ms to end, start offset by 24 ms.
    ->  Aggregate  (cost=0.00..863.04 rows=1 width=8)
          Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg0) with 4.514 ms to end, start offset by 24 ms.
          ->  Hash Join  (cost=0.00..863.04 rows=5000 width=1)
            Hash Cond: foos.foo_type = inner.min
            Rows out:  Avg 999.5 rows x 2 workers.  Max 1000 rows (seg0) with 3.464 ms to first row, 4.441 ms to end, start offset by 24 ms.
            Executor memory:  1K bytes avg, 1K bytes max (seg0).
            Work_mem used:  1K bytes avg, 1K bytes max (seg0). Workfile: (0 spilling, 0 reused)
            (seg0)   Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets.
            ->  Dynamic Table Scan on foos (dynamic scan id: 1)  (cost=0.00..431.10 rows=5000 width=4)
              Rows out:  Avg 999.5 rows x 2 workers.  Max 1000 rows (seg0) with 0.382 ms to first row, 0.478 ms to end, start offset by 27 ms.
              Partitions scanned:  Avg 1.0 (out of 5) x 2 workers.  Max 1 parts (seg0).
            ->  Hash  (cost=100.00..100.00 rows=50 width=4)
              Rows in:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg0) with 0.197 ms to end, start offset by 27 ms.
              ->  Partition Selector for foos (dynamic scan id: 1)  (cost=10.00..100.00 rows=50 width=4)
                Filter: foos.id = min
                Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg0) with 0.189 ms to first row, 0.190 ms to end, start offset by 27 ms.
                ->  Broadcast Motion 1:2  (slice2)  (cost=0.00..431.00 rows=2 width=4)
                      Rows out:  Avg 1.0 rows x 2 workers at destination.  Max 1 rows (seg0) with 0.015 ms to end, start offset by 27 ms.
                      ->  Aggregate  (cost=0.00..431.00 rows=1 width=4)
                        Rows out:  1 rows with 0.020 ms to end, start offset by 26 ms.
                        ->  Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..431.00 rows=1 width=4)
                          Rows out:  2 rows at destination with 0.009 ms to first row, 0.010 ms to end, start offset by 26 ms.
                          ->  Aggregate  (cost=0.00..431.00 rows=1 width=4)
                            Rows out:  Avg 1.0 rows x 2 workers.  Max 1 rows (seg0) with 0.079 ms to end, start offset by 25 ms.
                            ->  Table Scan on favorite_foo_types  (cost=0.00..431.00 rows=3 width=4)
                                  Rows out:  Avg 2.5 rows x 2 workers.  Max 3 rows (seg0) with 0.065 ms to first row, 0.067 ms to end, start offset by 25 ms.
Slice statistics:
  (slice0)    Executor memory: 414K bytes.
  (slice1)    Executor memory: 245K bytes avg x 2 workers, 245K bytes max (seg0).
  (slice2)    Executor memory: 253K bytes (entry db).
  (slice3)    Executor memory: 8493K bytes avg x 2 workers, 8493K bytes max (seg0).  Work_mem: 1K bytes max.
Statement statistics:
  Memory used: 128000K bytes
Settings:  optimizer=on
Optimizer status: PQO version 1.650
Total runtime: 30.161 ms

Notice in the query plan, it has "Dynamic Table Scan on foos" and then below that, " Partitions scanned: Avg 1.0 (out of 5)". This means, it dynamically eliminated 4 partitions and only 1 was scanned.

There is also a graphical plan checker on greenplum.org which can help you read the plan.

Upvotes: 1

Related Questions