skyw
skyw

Reputation: 369

SQL window functions: Performance impact of returning the same avg() many times?

I would like to SELECT a bunch of rows from table A, along with the results of aggregate functions like avg(A.price) and avg(A.distance).

Now, the SELECT query takes a good bit of time, so I don't want to run one query to get the rows, and other to get the averages. If I did that, I'd be running the query to SELECT the appropriate rows twice.

But looking at the PostgreSQL window function documentation (http://www.postgresql.org/docs/9.1/static/tutorial-window.html), it seems that using window function to return the results of the aggregate functions I want to use alongside the returned rows means that every single row returned would contain the results of the aggregate functions. And in my case, since the aggregation is over all the rows returned by the main SELECT query and not a subset of its rows, this seems wasteful.

What are the performance implications of returning the same avg() many times, given that I'm selecting a subset of the rows in A but doing aggregate queries across the entire subset? In particular, does Postgres recompute the average every time, or does it cache the average somehow?

By way of analogy: If you look at the window function docs and pretend that depname is 'develop' for every row returned by the SELECT query, and that the average is the same for every row because the average was computed across all returned rows. How many times is that average computed?

Upvotes: 8

Views: 4429

Answers (2)

John Velonis
John Velonis

Reputation: 1649

According to section 7.2.4 of the doc:

When multiple window functions are used, all the window functions having syntactically equivalent PARTITION BY and ORDER BY clauses in their window definitions are guaranteed to be evaluated in a single pass over the data.

Upvotes: 5

Gordon Linoff
Gordon Linoff

Reputation: 1269873

You can use a CTE to do what you want. According to the Postgres documentation:

A useful property of WITH queries is that they are evaluated only once per execution of the parent query, even if they are referred to more than once by the parent query or sibling WITH queries. Thus, expensive calculations that are needed in multiple places can be placed within a WITH query to avoid redundant work. Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as stated, without suppression of rows that the parent query might discard afterwards. (But, as mentioned above, evaluation might stop early if the reference(s) to the query demand only a limited number of rows.)

You can structure you final results using a structure such as:

with cte as (your basic select goes here)
select *
from cte cross join
     (select averages here
      from cte
     ) const
where < your row filter here>

Upvotes: 3

Related Questions