Reputation: 1120
First let me introduce you to an example and after that I'll ask the question. Code
SELECT
orderyear
,qty
FROM
Sales.OrderTotalsByYear;
gives me a table that looks like this
orderyear qty
----------- -----------
2007 25489
2008 16247
2006 9581
I need to return for each year the order year, quantity, and running total quantity over the years. That is, for each year, return the sum of the quantity up to that year. So for the earliest year recorded in the view (2006), the running total is equal to that year’s quantity. For the second year (2007), the running total is the sum of the first year plus the second year, and so on. The code looks like this
SELECT
orderyear
,qty
,(
SELECT
SUM(O2.qty)
FROM
Sales.OrderTotalsByYear AS O2
WHERE
O2.orderyear <= O1.orderyear
)
AS runqty
FROM
Sales.OrderTotalsByYear AS O1
ORDER BY
orderyear;
and a table
orderyear qty runqty
----------- ----------- -----------
2006 9581 9581
2007 25489 35070
2008 16247 51317
Now, I understand what this code does but I don't understand HOW it does it. I have experience in procedural and object oriented programming but this just drives me crazy. If the query goes in a way like this
- FROM
- WHERE
- GROUP BY
- HAVING
- SELECT
- ORDER BY
then how it manages to combine inner SELECT inside outer SELECT by using relationships between inner and outer SELECT? Is the outer SELECT runned first and it stops when it hits first element inside a table, and then the inner SELECT starts to run for elements where O2.orderyear <= O1.orderyear is true? Or is there something totally different going on?
Upvotes: 3
Views: 73
Reputation: 95053
You can think of the query in two steps: The subquery in your select clause is correlated, i.e. it refers to the main query via O2.orderyear <= O1.orderyear
. So the main query gets executed, and for every single record it reads from the table (aliased O1), it executes the subquery in order to get the running quantity.
So when the main query reads the record for 2008, it executes
SELECT SUM(O2.qty)
FROM Sales.OrderTotalsByYear AS O2
WHERE O2.orderyear <= 2008
There are two records with a year <= 2008: 2007 and 2008. Their quantities are being added, and thus you get the running total to display with the 2008 record. The same is done for the 2007 and the 2009 record. At last your rows are ordered so as to show the years in ascending order.
Upvotes: 0
Reputation: 239764
Logically, for each row that the system has generated (via FROM
, WHERE
, GROUP BY
and HAVING
), the system will evaluate the SELECT
clause. As part of that evaluation of each row, the system will evaluate the correlated subquery:
(
SELECT
SUM(O2.qty)
FROM
Sales.OrderTotalsByYear AS O2
WHERE
O2.orderyear <= O1.orderyear
)
Using the current row's O1.orderyear
value.
However, from a practical standpoint1, the system may be able to optimize its evaluation of this subquery. A smart enough optimizer, if the statistics suggest it is worth doing, may decide to evaluate the outer query in orderyear
order, and to also create a copy of the OrderTotalsByYear
table sorted in orderyear
order (or to make use of an index that already represents this sort order). In such a case, the system would be able to evaluate this subquery result without having to re-scan the entire OrderTotalsByYear
table for each row of the outer query.
What the optimizer does can only be ascertained by obtaining the execution plan, and will depend on your specific tables - their structure, indexes and the data contained within them.
1SQL is defined in terms of a logical processing order. Implementations are free to perform operations in a different order than the logical processing order, provided they produce the same results which would have been obtained if the logical processing order is followed2. SQL is also defined, generally, to work on sets rather than to specify row-by-row or left-to-right processing.
2SQL Server takes more liberties here than it ought to and may generate errors which would not have been generated had it followed the logical processing order. Ho hum.
Upvotes: 1