Reputation: 90
I have a greenplum Cluster with below specifications:
Master (16 VCPUs, 32GB RAM, 27GB Swap) 4 Segments (16 VCPUs, 62GB RAM, 27GB Swap) on each
Earlier i had two segments and was having outstanding performance for my use cases but ever since i have expanded the cluster to four nodes, i am unable to get the indexes to being used by the queries.
The queries which were being being executed within 10ms (with index hit) are now taking 2-5 seconds on sequential scan.
I have attached my schema and some sample explain analyze outputs(This is a sample query plan, relevant table has 48260809 number of rows in it).
Schema:
\c dmiprod
Create Table dmiprod_schema."package"
(
identity varchar(4096) not null,
"identityHash" bytea not null,
"packageDate" date not null,
ctime timestamp not null,
customer varchar(32) not null
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.package ADD CONSTRAINT "package_pkey" PRIMARY KEY ("identityHash");
create index idx_package_ctime on dmiprod_schema.package ("ctime");
create index idx_package_packageDate on dmiprod_schema.package ("packageDate");
create index idx_package_customer on dmiprod_schema.package ("customer");
CREATE TABLE dmiprod_schema."tags"
(
"identityHash" bytea not null,
tag varchar(32) not null,
UNIQUE ("identityHash",tag)
) distributed by ("identityHash");
create index "idx_tags_identityHash" on dmiprod_schema.tags ("identityHash");
create index idx_tags_tag on dmiprod_schema.tags ("tag");
CREATE TABLE dmiprod_schema."features"
(
"identityHash" bytea not null,
ctime timestamp not null,
utime timestamp not null,
phash varchar(64) ,
ahash varchar(64),
chash varchar(78),
iimages JSON ,
lcert JSON ,
slogos JSON
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.features ADD CONSTRAINT "features_pkey" PRIMARY KEY ("identityHash");
create index idx_features_phash on dmiprod_schema.features ("phash");
CREATE TABLE dmiprod_schema."raw"
(
"identityHash" bytea not null,
ctime timestamp not null,
utime timestamp not null,
ourl TEXT,
lurl TEXT,
"pageText" TEXT,
"ocrText" TEXT,
html TEXT,
meta JSON
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.raw ADD CONSTRAINT "raw_pkey" PRIMARY KEY ("identityHash");
CREATE TABLE dmiprod_schema.packageLock
(
"identityHash" bytea not null,
secret bytea not null,
ctime timestamp not null,
UNIQUE ("identityHash")
) distributed by ("identityHash");
ALTER TABLE ONLY dmiprod_schema.packageLock ADD CONSTRAINT "packageLock_pkey" PRIMARY KEY ("identityHash");
create index idx_packageLock_secret on dmiprod_schema.packageLock ("secret");
Upvotes: 0
Views: 71
Reputation: 17
Just to confirm: when you expanded the system (assuming the use of gpexpand), you did run the redistribution phase (gpexpand is a two step process)? When that completed, did you run analyzedb on the system to make sure statistics were updated with the new table/index segments?
Upvotes: 0
Reputation: 31
Recreating the table and the indexes, inserting 20 million 32, 48, 64, 96 and 128 length random bytea in identityHash
, and then performing same select results in package_pkey
being used and in under 20ms.
Aside from index usage, the other difference is usage of GPORCA optimizer. I suggest you set optimizer = 'on';
and try again. If that does not work, post your GPDB/Greenplum version and session settings include optimizer, enable_indexscan
, and any other relevant settings.
I tested on VMs single physical host and Tanzu Greenplum 6.17.1 with 4 segment hosts.
Upvotes: 1