Reputation: 3
I was checking execution plans for parts of a complicated query and came up with this:
postgres=# explain analyze
select * from generate_series(
(CURRENT_DATE)::timestamp without time zone,
(CURRENT_DATE + '14 days'::interval),
'1 day'::interval)
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Function Scan on generate_series (cost=0.01..10.01 rows=1000 width=8) (actual time=0.024..0.036 rows=15 loops=1)
Planning Time: 0.031 ms
Execution Time: 0.064 ms
(3 rows)
AFAIK, postgresql estimates rows based on reltuples size for a given table, that's understandable.
Given that the mentioned generate_series
actually generates 14 rows, where does the rows=1000
come from in case of a function scan?
Upvotes: 0
Views: 359
Reputation: 44230
As an obvious workaround, you could fool the planner by wrapping the query in a subquery with a LIMIT:
select * FROM (
select * from generate_series(
(CURRENT_DATE)::timestamp without time zone,
(CURRENT_DATE + '14 days'::interval),
'1 day'::interval)
LIMIT 15;
) xxx
;
Upvotes: 0
Reputation: 14861
What's the point. This is a very clear instance of premature optimization. In this case the result is back before before the brain realizes you "pressed" run.
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.
Donald Knuth, The Art of Computer Programming, 1962.
It would appear it's at least as big a problem today as then.
Upvotes: 0
Reputation: 44137
The generate_series
functions which take number arguments has a "support function" which will peek at the arguments and then tell the planner how many rows to expect. But the ones that deal with time stamps do not have such support functions. Instead it just estimates that it returns prorows rows, which defaults to 1000.
You could change this estimate if you want:
alter function generate_series(timestamp without time zone, timestamp without time zone, interval)
rows 14;
But this change will not survive pg_upgrade, nor dump/reload.
This is version specific, as support functions were only implemented in v12. Before that even the number-taking forms always planned on 1000 rows (or whatever prorows was set to for that function).
Upvotes: 0
Reputation: 7882
According to the documentation:
For those interested in further details, estimation of the size of a table (before any WHERE clauses) is done in src/backend/optimizer/util/plancat.c. The generic logic for clause selectivities is in src/backend/optimizer/path/clausesel.c. The operator-specific selectivity functions are mostly found in src/backend/utils/adt/selfuncs.c.
this is the function that computes estimations for functions:
/*
* function_selectivity
*
* Returns the selectivity of a specified boolean function clause.
* This code executes registered procedures stored in the
* pg_proc relation, by calling the function manager.
*
* See clause_selectivity() for the meaning of the additional parameters.
*/
Selectivity
function_selectivity(PlannerInfo *root,
Oid funcid,
List *args,
Oid inputcollid,
bool is_join,
int varRelid,
JoinType jointype,
SpecialJoinInfo *sjinfo)
{
It looks like this C function will read data in pg_proc system catalog where we have:
postgres=# select proname, prosupport, prorows
from pg_proc
where proname like '%generate%';
proname | prosupport | prorows
------------------------------+------------------------------+---------
generate_subscripts | - | 1000
generate_subscripts | - | 1000
generate_series | generate_series_int4_support | 1000
generate_series | generate_series_int4_support | 1000
generate_series_int4_support | - | 0
generate_series | generate_series_int8_support | 1000
generate_series | generate_series_int8_support | 1000
generate_series_int8_support | - | 0
generate_series | - | 1000
generate_series | - | 1000
generate_series | - | 1000
generate_series | - | 1000
(12 rows)
It looks like the pg_proc.prorows column is the retrieved estimation.
Upvotes: 1