Reputation: 1258
I've run into an issue where PostgreSQL (13.1 on windows, default config) would start preferring a generic plan after 5 executions because the plan estimate is clearly wrong. Thing is, the custom plan is up to 10 times faster than the generic plan, and at least 2.5 times faster in the "worst" case. I don't want to use set plan_cache_mode = force_custom_plan, because the query is part of a software run in parallel with connection pooling over several threads, as well as dozens of smaller and larger queries, and it would feel like doing a hacky workaround that might also be dangerous for other queries.
I think I've tracked it down to the generic plan estimate being wrong for the x.id = ANY($1) parts in the where clause which I'Ve got three of.
Query goes like this:
SELECT ... FROM ... WHERE ... x.id = ANY($1) AND y.id = ANY($2) AND z.id = ANY($3) ...
of course, this is hugely simplified, but I know that I pass an int8[] of 50 entries for the first parameter 99% of the time (because the query is for a paged view of records, where one page has 50 records), which is correctly estimated as 50 rows for the custom query plan, but 10 rows for the generic query plan.
Switching to using IN($1, $2, ... $50) fixes this, but we're currently trying to move over our IN clauses to ANY, because it's more efficient over JDBC and we've been bitten by the parameter limit a few times with IN (which doesn't happen with ANY, as then it's only one parameter). Also, this would make the count of parameters variable, so the query planner would get a different query quite often (only $1 is pretty much always an array of 50 values on our prod system, the others may have less or more depending on lots of factors).
So far I've tried, without much success:
Note that I do know why it uses a generic plan after the 5th execution - because the estimate of that is lower than the average estimate of the 5 previous custom plans. But I don't know how to fix that wrong estimate.
Here's the relevant rows of the custom plan:
-> Bitmap Heap Scan on sbuilding b (cost=150.51..332.59 rows=50 width=29) (actual time=0.077..0.187 rows=50 loops=2)
Recheck Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50}'::bigint[]))
Filter: (deletedat IS NULL)
Heap Blocks: exact=50
-> Bitmap Index Scan on sbuilding_pkey (cost=0.00..150.50 rows=50 width=0) (actual time=0.065..0.066 rows=50 loops=2)
Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50}'::bigint[]))
And the same part for the generic plan:
-> Index Scan using sbuilding_pkey on sbuilding b (cost=0.28..82.84 rows=10 width=29) (actual time=0.049..0.229 rows=50 loops=2)
Index Cond: (id = ANY ($2))
Filter: (deletedat IS NULL)
I've replaced actual real ids with numbers 1-50, but the real ids all exist, as they are retrieved from the db in a step before this one.
I'd prefer not having to post the full query or query plans, as those contain a lot of sensitive information (would be hundreds of words and values to replace for me), but I hope these parts are enough to show the issue and hopefully one of you will be able to assist.
Edit: I don't have any array extension like intArray installed explicitly - found a mailing list thread regarding estimates with intArray extension, so I thought I'd add this. Not sure though if it still applies? Here's the link: https://www.postgresql.org/message-id/20150317191531.GE10492%40momjian.us
Thanks a lot!
Edit2: I just retested with a super simple query and it seems that the query planner always assumes 10 values for = ANY(:arrayParam) for generic plans. This seems to be the culprit in my case. Is there any way to "fix" that assumption? Preferable without installing any extensions that need config changes like pg_hint_plan. Thx!
Edit3: found the following in postgres source code selfuncs.c:
/*
* Arbitrarily assume 10 elements in the eventual array value (see
* also estimate_array_length). We don't risk an assumption of
* disjoint probabilities here.
*/
for (i = 0; i < 10; i++)
{
if (useOr)
s1 = s1 + s2 - s1 * s2;
else
s1 = s1 * s2;
}
/*
* Estimate number of elements in the array yielded by an expression.
*
* It's important that this agree with scalararraysel.
*/
int
estimate_array_length(Node *arrayexpr)
{
/* look through any binary-compatible relabeling of arrayexpr */
arrayexpr = strip_array_coercion(arrayexpr);
if (arrayexpr && IsA(arrayexpr, Const))
{
Datum arraydatum = ((Const *) arrayexpr)->constvalue;
bool arrayisnull = ((Const *) arrayexpr)->constisnull;
ArrayType *arrayval;
if (arrayisnull)
return 0;
arrayval = DatumGetArrayTypeP(arraydatum);
return ArrayGetNItems(ARR_NDIM(arrayval), ARR_DIMS(arrayval));
}
else if (arrayexpr && IsA(arrayexpr, ArrayExpr) &&
!((ArrayExpr *) arrayexpr)->multidims)
{
return list_length(((ArrayExpr *) arrayexpr)->elements);
}
else
{
/* default guess --- see also scalararraysel */
return 10;
}
}
I'll see where I can get from there, but I still hope someone has a solution that does not involve me going through the whole postgres source code ;)
Upvotes: 4
Views: 1492
Reputation: 1206
If you do not want to go for the solution of setting plan_cache_mode
that was suggested by Laurenz Albe, I have this suggestion:
Add an additional condition as a literal to your prepared statement that
Epoch with milliseconds could be used for instance:
SELECT ... FROM ... WHERE ... x.id = ANY($1) AND y.id = ANY($2) AND z.id = ANY($3) ... AND 1678193956123=1678193956123
SELECT ... FROM ... WHERE ... x.id = ANY($1) AND y.id = ANY($2) AND z.id = ANY($3) ... AND 1678193974456=1678193974456
SELECT ... FROM ... WHERE ... x.id = ANY($1) AND y.id = ANY($2) AND z.id = ANY($3) ... AND 1678193987789=1678193987789
Or you could use a hash of your input params etc.
Upvotes: 0
Reputation: 247790
That can be done by changing plan_cache_mode
for this query:
BEGIN;
SET LOCAL plan_cache_mode = force_custom_plan;
SELECT /* your query */
COMMIT;
That will make the optimizer always use a custom plan.
Upvotes: 2