Why in greenplum, partitioned table uses nestedloop join, while non-partitioned table uses hash join

Question

I created two tables(A, B) with 100 columns, same DDL except that B was partitioned

CREATE TABLE A (
  id integer, ......, col integer,
  CONSTRAINT A_pkey PRIMARY KEY (id))
WITH (OIDS = FALSE)
TABLESPACE pg_default
DISTRIBUTED BY (id);

CREATE TABLE B (
  id integer, ......, col integer,
  CONSTRAINT B_pkey PRIMARY KEY (id))
WITH (OIDS = FALSE)
TABLESPACE pg_default
DISTRIBUTED BY (id)
PARTITION BY RANGE(id) 
  (START (1) END (2100000) EVERY (500000), 
   DEFAULT PARTITION extra 
  );

and imported the same data (2000000 rows) into A and B. Then I executed the sql with A and B separately:

UPDATE A a SET a.col = c.col from C c where c.id = a.id
UPDATE B b SET b.col = c.col from C c where c.id = b.id

As the result, A succeeded after a minute but B took a long time and at last a memory error occurred:

ERROR:  Canceling query because of high VMEM usage.

So I checked EXPLAIN of the two sql, I found that A used Hash Join but B used Nested-Loop Join.

Is there some reason why partitioned table use nested-loop join? Is it unnecessary for greenplum to use table partition when store millions of data?

Jon Roberts · Accepted Answer

You are doing a few of things that are not recommended which might explain why you are seeing nested loops.

Avoid UPDATE statements in general. The old version of the row remains on disk plus the new version of the row. So if you update an entire table, you are effectively doubling the physical size on disk that it is using.
I've never seen a heap table used for a partitioned table. You should be using mainly Append Only tables in Greenplum and especially on larger tables such as a partitioned table.
You are partitioning by the distributing key. That is not recommended and not beneficial at all. Are you planning on filtering by a range of IDs? That is pretty unusual. If so, change the distribution key to something else.
I thought Pivotal disabled the ability to create a Primary Key on a partitioned table. At one time, that was not allowed. I would discourage you from creating any primary keys at all as it just takes up space and the optimizer doesn't typically use it.

After fixing these items, I can't reproduce your nested loop issue. I'm using version 5.0.0 too.

    drop table if exists a;
    drop table if exists b;
    drop table if exists c;
    CREATE TABLE A 
    (id integer, col integer, mydate timestamp)
    WITH (appendonly=true)
    DISTRIBUTED BY (id);

    CREATE TABLE B 
    (id integer, col integer, mydate timestamp)
    WITH (appendonly=true)
    DISTRIBUTED BY (id)
    PARTITION BY RANGE(mydate) 
      (START ('2015-01-01'::timestamp) END ('2018-12-31'::timestamp) EVERY ('1 month'::interval), 
       DEFAULT PARTITION extra 
      );

    create table c
    (id integer, col integer, mydate timestamp)
    distributed by (id);

    insert into a
    select i, i+10, '2015-01-01'::timestamp + '1 day'::interval*i
    from generate_series(0, 2000) as i
    where '2015-01-01'::timestamp + '1 day'::interval*i < '2019-01-01'::timestamp;

    insert into b
    select i, i+10, '2015-01-01'::timestamp + '1 day'::interval*i
    from generate_series(0, 2000) as i
    where '2015-01-01'::timestamp + '1 day'::interval*i < '2019-01-01'::timestamp;

    insert into c
    select i, i+10, '2015-01-01'::timestamp + '1 day'::interval*i
    from generate_series(0, 2000) as i
    where '2015-01-01'::timestamp + '1 day'::interval*i < '2019-01-01'::timestamp;


    explain UPDATE A a SET col = c.col from C c where c.id = a.id;
    /*
    "Update  (cost=0.00..862.13 rows=1 width=1)"
    "  ->  Result  (cost=0.00..862.00 rows=1 width=34)"
    "        ->  Split  (cost=0.00..862.00 rows=1 width=30)"
    "              ->  Hash Join  (cost=0.00..862.00 rows=1 width=30)"
    "                    Hash Cond: public.a.id = c.id"
    "                    ->  Table Scan on a  (cost=0.00..431.00 rows=1 width=26)"
    "                    ->  Hash  (cost=431.00..431.00 rows=1 width=8)"
    "                          ->  Table Scan on c  (cost=0.00..431.00 rows=1 width=8)"
    "Settings:  optimizer_join_arity_for_associativity_commutativity=18"
    "Optimizer status: PQO version 2.42.0"
    */

    explain UPDATE B b SET col = c.col from C c where c.id = b.id;
    /*
    "Update  (cost=0.00..862.13 rows=1 width=1)"
    "  ->  Result  (cost=0.00..862.00 rows=1 width=34)"
    "        ->  Split  (cost=0.00..862.00 rows=1 width=30)"
    "              ->  Hash Join  (cost=0.00..862.00 rows=1 width=30)"
    "                    Hash Cond: public.a.id = c.id"
    "                    ->  Table Scan on a  (cost=0.00..431.00 rows=1 width=26)"
    "                    ->  Hash  (cost=431.00..431.00 rows=1 width=8)"
    "                          ->  Table Scan on c  (cost=0.00..431.00 rows=1 width=8)"
    "Settings:  optimizer_join_arity_for_associativity_commutativity=18"
    "Optimizer status: PQO version 2.42.0"

    */

Why in greenplum, partitioned table uses nestedloop join, while non-partitioned table uses hash join

Answers (1)

Related Questions