NewPy
NewPy

Reputation: 663

What is the execution order of the PARTITION BY clause compared to other SQL clauses?

I cannot find any source mentioning execution order for Partition By window functions in SQL.

enter image description here

Is it in the same order as Group By?

For example table like: enter image description here

enter image description here

Select *, row_number() over (Partition by Name) 
from NPtable 
Where Name = 'Peter'

I understand if Where gets executed first, it will only look at Name = 'Peter', then execute window function that just aggregates this particular person instead of entire table aggregation, which is much more efficient.

But when the query is:

Select top 1 *, row_number() over (Partition by Name order by Date) 
from NPtable 
Where Date > '2018-01-02 00:00:00'

Doesn't the window function need to be executed against the entire table first then applies the Date> condition otherwise the result is wrong?

Upvotes: 16

Views: 13998

Answers (3)

Venkataraman R
Venkataraman R

Reputation: 12979

It is part of the SELECT phase of the query execution. There are different types of SELECT clauses, based on the query.

  • SELECT FOR
  • SELECT GROUP BY
  • SELECT ORDER BY
  • SELECT OVER
  • SELECT INTO
  • SELECT HAVING

PARTITION BY comes in the SELECT OVER clause. Here, a window of the result set is generated out of the result set generated in the previous stages: FROM, WHERE, GROUP BY etc.

The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.

OVER ( [ PARTITION BY value_expression ] [ order_by_clause ] )

Arguments

PARTITION BY Divides the query result set into partitions. The window function is applied to each partition separately and computation restarts for each partition.

value_expression Specifies the column by which the rowset is partitioned. value_expression can only refer to columns made available by the FROM clause. value_expression cannot refer to expressions or aliases in the select list. value_expression can be a column expression, scalar subquery, scalar function, or user-defined variable.

Defines the logical order of the rows within each partition of the result set. That is, it specifies the logical order in which the window functioncalculation is performed.

order_by_expression Specifies a column or expression on which to sort. order_by_expression can only refer to columns made available by the FROM clause. An integer cannot be specified to represent a column name or alias.

You can read more about it SELECT-OVER

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1270723

row_number() (and other window functions) are allowed in two clauses:

  • SELECT
  • ORDER BY

The function is parsed along with the rest of the clause. After all, it is a function present in the clause. In both cases, the WHERE clause would be -- logically -- applied first, so the results would be after filtering.

Do note that this is a logical parsing of the query. The actual execution may have little to do with the structure of the query.

Upvotes: 3

Vladimir Baranov
Vladimir Baranov

Reputation: 32693

Window functions are executed/calculated at the same stage as SELECT, stage 5 in your table. In other words, window functions are applied to all rows that are "visible" in the SELECT stage.

In your second example

Select top 1 *, 
row_number() over (Partition by Name order by Date) 
from NPtable 
Where Date > '2018-01-02 00:00:00'

WHERE is logically applied before Partition by Name of the row_number() function.

Note, that this is logical order of processing the query, not necessarily how the engine physically processes the data.

If query optimiser decides that it is cheaper to scan the whole table and later discard dates according to the WHERE filter, it can do it. But, any kind of these transformations must be performed in such a way that the final result is consistent with the order of the logical steps outlined in the table you showed.

Upvotes: 25

Related Questions