Reputation: 66
In this scenario is the SELECT TOP necessary within the CTE (micro optimization, I know...).
DECLARE @pageSize INT, @currentPage INT, @top INT
SET @pageSize = 10
SET @currentPage = 150
SET @top = @pageSize * @currentPage + @pageSize
WITH t AS
(
SELECT TOP(***@top***) ID, name
ROW_NUMBER() OVER (ORDER BY ID) AS _row,
FROM dbo.User
)
SELECT TOP(@pageSize) *
FROM t
WHERE t._row > (@top-@pageSize)
ORDER BY t.ID
The above returns 10 (@pageSize) rows from a start number (@top-@pageSize) in a specific order with a row number column. Does the CTE statement know that the "SELECT TOP" outside of the CTE and the WHERE-clause, also outside the CTE, is to come, hence the CTE never returns more rows in the specific order than needed?
Basically just talking about the ROW_NUMBER function, that it does not count a row number for rows not returned (if I were to have millions of rows...), and also if I were to select top 100 in the CTE, would row_number still be calculated for all the million rows within the table selected?
I have tried with and without "SELECT TOP(@top)" in the CTE-statement, inside a loop with 10.000 runs, without seeing any difference in time usage. Though, I have only 38.000 rows in the table at the moment.
Edit: So the result:
WITH t AS
(
**DO A TOP() WITH AN ORDER BY IN THE CTE**
SELECT TOP(@top) ID, name
ROW_NUMBER() OVER (ORDER BY ID) AS _row,
FROM dbo.User
ORDER BY ID
)
SELECT TOP(@pageSize) *
**SELECTING TOP N FROM THE CTE, WHERE ROW-NUMBER IS ... DUE TO THE CTE IS IN ORDER ALREADY**
FROM t
WHERE t._row > (@top-@pageSize)
This could probably be more efficient if I ORDERED them "backwards", selecting the "bottom @pageSize" of the CTE, which would leave out the where-clause... but that would require some test if it actually were faster...
Upvotes: 1
Views: 5438
Reputation: 138960
If you have a supporting index on ID
you do not have to read and enumerate the whole table. SQL Server will have to read the table up to and including the page you want. So if for example you want page 1 (rows 11 to 20) the query will only fetch 20 rows. And that is true even if you don't use the top in the CTE.
A table to test on with some data:
create table dbo.[User]
(
ID int identity primary key,
Name nvarchar(128)
)
go
insert into dbo.[User](Name)
select top(1000) Name
from sys.all_objects
A query without the redundant top expressions.
DECLARE @pageSize INT, @currentPage INT, @top INT;
SET @pageSize = 10;
SET @currentPage = 1;
SET @top = @pageSize * @currentPage + @pageSize;
with C as
(
select U.ID,
U.Name,
row_number() over(order by U.ID) as rn
from dbo.[User] as U
)
select C.ID,
C.Name
from C
where C.rn > @pageSize * @currentPage and
C.rn <= @pageSize * (@currentPage + 1);
This will give you a query plan like this:
The number by each operator is the number of rows actually fetched. The clustered index scan reads 20 rows ordered by ID
. Segment and Sequence Project enumerates the rows. Top is the operator that makes sure that no more than 20 rows is fetched. The filter removes the rows 1 to 10 and let the rows 11 to 20 through.
If we instead try to get page 5 (@currentPage = 5
to get rows 51 to 60) the plan will look like this:
Top operator makes sure only 60 rows is read from the clustered index and the filter filters out the first 50 rows to return the last 10 rows.
Using your last query with the extra top expressions will not add anything of value. Only one extra redundant top operator.
The key to understand what is going on in the query plan is to know that execution is done from left to right demanding one row at a time. That is how the top operator can stop the clustered index scan when enough rows are returned.
Upvotes: 1
Reputation: 1269723
The use of top
without an order by
is discouraged. There is no guarantee that you will get the rows that you want, so you should not include the top
. Or, you should include an order by id
, if that is the ordering that you want.
The user of top
doesn't affect the row_number()
calculation, because that calculation is going to be done before the top
is applied. You can imagine having another window function there, such as sum() over ()
to understand that the top
cannot generally be applied before the row_number()
and finding the circumstances where it is safe would be hard work.
Upvotes: 2