Osaf Malik
Osaf Malik

Reputation: 90

Primary Index not being used

I have a greenplum Cluster with below specifications:

Master (16 VCPUs, 32GB RAM, 27GB Swap) 4 Segments (16 VCPUs, 62GB RAM, 27GB Swap) on each

Earlier i had two segments and was having outstanding performance for my use cases but ever since i have expanded the cluster to four nodes, i am unable to get the indexes to being used by the queries.

The queries which were being being executed within 10ms (with index hit) are now taking 2-5 seconds on sequential scan.

I have attached my schema and some sample explain analyze outputs(This is a sample query plan, relevant table has 48260809 number of rows in it).

Schema:

\c dmiprod
Create Table dmiprod_schema."package" 
(
    identity varchar(4096) not null,
    "identityHash" bytea not null,
    "packageDate" date not null,
    ctime timestamp not null,
    customer varchar(32) not null
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.package ADD CONSTRAINT "package_pkey" PRIMARY KEY ("identityHash");
create index idx_package_ctime on dmiprod_schema.package ("ctime");
create index idx_package_packageDate on dmiprod_schema.package ("packageDate");
create index idx_package_customer on dmiprod_schema.package ("customer");

CREATE TABLE dmiprod_schema."tags"
(
    "identityHash" bytea not null, 
    tag varchar(32) not null,
    UNIQUE ("identityHash",tag)
) distributed by ("identityHash");
create index "idx_tags_identityHash" on dmiprod_schema.tags ("identityHash");
create index idx_tags_tag on dmiprod_schema.tags ("tag");

CREATE TABLE dmiprod_schema."features"
(
    "identityHash" bytea not null, 
    ctime timestamp not null,
    utime timestamp not null,
    phash varchar(64) ,
    ahash varchar(64),
    chash varchar(78),
    iimages JSON ,
    lcert JSON ,
    slogos JSON
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.features ADD CONSTRAINT "features_pkey" PRIMARY KEY ("identityHash");
create index idx_features_phash on dmiprod_schema.features ("phash");


CREATE TABLE dmiprod_schema."raw"
(
    "identityHash" bytea not null, 
    ctime timestamp not null,
    utime timestamp not null,
    ourl TEXT,
    lurl TEXT,
    "pageText" TEXT,
    "ocrText" TEXT,
    html TEXT,
    meta JSON 
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.raw ADD CONSTRAINT "raw_pkey" PRIMARY KEY ("identityHash");


CREATE TABLE dmiprod_schema.packageLock
(
    "identityHash" bytea not null,
    secret bytea not null,
    ctime timestamp not null,
    UNIQUE ("identityHash")
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.packageLock ADD CONSTRAINT "packageLock_pkey" PRIMARY KEY ("identityHash");
create index idx_packageLock_secret on dmiprod_schema.packageLock ("secret");

Upvotes: 0

Views: 71

Answers (2)

Jim McCann
Jim McCann

Reputation: 17

Just to confirm: when you expanded the system (assuming the use of gpexpand), you did run the redistribution phase (gpexpand is a two step process)? When that completed, did you run analyzedb on the system to make sure statistics were updated with the new table/index segments?

Upvotes: 0

Greg Spiegelberg
Greg Spiegelberg

Reputation: 31

Recreating the table and the indexes, inserting 20 million 32, 48, 64, 96 and 128 length random bytea in identityHash, and then performing same select results in package_pkey being used and in under 20ms.

Aside from index usage, the other difference is usage of GPORCA optimizer. I suggest you set optimizer = 'on'; and try again. If that does not work, post your GPDB/Greenplum version and session settings include optimizer, enable_indexscan, and any other relevant settings.

I tested on VMs single physical host and Tanzu Greenplum 6.17.1 with 4 segment hosts.

Upvotes: 1

Related Questions