Reputation: 3048
I have a table containing about 5 million records and a GIN index on Barcodes
column.
I have this query:
SELECT T1.*
FROM public."Skus" AS T1
WHERE (T1."Barcodes" && '{ "THE_BARCODE" }')
LIMIT 11;
It takes very long (seconds) and the explain shows me the db is constantly running a seq scan
.
Now, if I omit the LIMIT
:
SELECT T1.*
FROM public."Skus" AS T1
WHERE (T1."Barcodes" && '{ "THE_BARCODE" }');
it takes only a few milliseconds and the explain shows me an index scan
.
Why? How to avoid/overcome this?
Note: queries are dynamically generated by application, so isn't easy to remove the `LIMIT' (and sometimes it is even necessary).
Additional information as specified at: https://wiki.postgresql.org/wiki/Slow_Query_Questions
SELECT version();
gives
PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, 64-bit
Azure Database for PostgreSQL flexible server
CREATE TABLE IF NOT EXISTS public."Skus"
(
"Id" text COLLATE pg_catalog."default" NOT NULL,
"Style" text COLLATE pg_catalog."default" NOT NULL,
"ColorId" text COLLATE pg_catalog."default",
"DropId" text COLLATE pg_catalog."default",
"SizeId" text COLLATE pg_catalog."default",
"LotId" text COLLATE pg_catalog."default",
"Barcodes" text[] COLLATE pg_catalog."default" NOT NULL,
"AuthorizationTags" text[] COLLATE pg_catalog."default" NOT NULL,
"CreatedBy" text COLLATE pg_catalog."default" NOT NULL,
"CreatedOn" timestamp with time zone NOT NULL,
"UpdatedBy" text COLLATE pg_catalog."default" NOT NULL,
"UpdatedOn" timestamp with time zone NOT NULL,
CONSTRAINT "PK_Skus" PRIMARY KEY ("Id")
)
TABLESPACE pg_default;
ALTER TABLE IF EXISTS public."Skus"
OWNER to azadmin;
-- Index: IX_Skus_Barcodes
-- DROP INDEX IF EXISTS public."IX_Skus_Barcodes";
CREATE INDEX IF NOT EXISTS "IX_Skus_Barcodes"
ON public."Skus" USING gin
("Barcodes" COLLATE pg_catalog."default")
WITH (fastupdate=True)
TABLESPACE pg_default;
About 5 milions rows
SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid)
FROM pg_class
WHERE relname='TABLE_NAME'
gives:
Does the table have anything unusual about it?
Note: SET track_io_timing = on
is prohibited by Azure
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT T1.*
FROM public."Skus" AS T1
WHERE (T1."Barcodes" && '{ "7622336162738" }')
LIMIT 11;
gives this: https://explain.depesz.com/s/xZj7
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT T1.*
FROM public."Skus" AS T1
WHERE (T1."Barcodes" && '{ "7622336162738" }');
gives this: https://explain.depesz.com/s/vrFZ
The query with LIMIT has always been slow, the other one has always been fast
Azure instance type: Burstable, B2s, 2 vCores, 4 GiB RAM, 32 GiB storage
N/A (Azure)
Are you running autovacuum? I don't know
If so, with what settings?
If not, are you doing manual VACUUM and/or ANALYZE? NO
How often?
SELECT * FROM pg_stat_user_tables WHERE relname='table_name
gives:
For data writing queries: have you moved the WAL to a different disk? Changed the settings? Not a writing query
SELECT name, current_setting(name), source
FROM pg_settings
WHERE source NOT IN ('default', 'override');
gives:
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, attname, inherited, null_frac, n_distinct, array_length(most_common_vals,1) n_mcv, array_length(histogram_bounds,1) n_hist, correlation
FROM pg_stats
WHERE attname='...' AND tablename='...'
ORDER BY 1 DESC;
gives:
Upvotes: 1
Views: 353
Reputation: 44363
It thinks there will be 25,637 qualifying rows and it will get to stop after reading only 11 of them, which would be quite fast if it were true. But it actually has to read the entire table because there is only 1 qualifying row.
I think you are running into a problem already known, that the pg_stats.most_common_elem_freqs is left null in this case, and then when the planner sees the NULL it assumes it will return 0.5% of the table which is way too high. There was agreement it was a problem, but not a consensus on the best way to solve it, so no solution was ever implemented. If you are willing to compile your own postgresql, you could try one of approaches discussed in the linked thread and see if those solve the problem for you.
The only practical solution I see is to install pg_hint_plan and then provide a hint /*+ BitmapScan("Skus") */
for the query. But if you can't alter the query text, I don't see how you could inject the hint into it. These hints are very much like Oracle's hints, so maybe your app has mechanism for doing that if it aims to be cross-platform.
Upvotes: 1