ca9163d9
ca9163d9

Reputation: 29159

Window function behaves differently in Subquery/CTE?

I thought the following three SQL statements are semantically the same. The database engine will expand the second and third query to the first one internally.

select .... 
from T 
where Id = 1

select * 
from 
    (select .... from T) t 
where Id = 1

select * 
from 
    (select .... from T where Id = 1) t

However, I found the window function behaves differently. I have the following code.

-- Prepare test data
with t1 as 
( 
    select *
    from (values ( 2, null), ( 3, 10), ( 5, -1), ( 7, null), ( 11, null), ( 13, -12), ( 17, null), ( 19, null), ( 23, 1759) ) v ( id, col1 )
)
select * 
into #t 
from t1

alter table #t add primary key (id)
go

The following query returns all the rows.

select  
     id, col1,
     cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) 
                       over (order by id
                             rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from    
    #t

id  col1    lastval
-------------------
2   NULL    NULL
3   10      NULL
5   -1      10
7   NULL    -1
11  NULL    -1
13  -12     -1
17  NULL    -12
19  NULL    -12
23  1759    -12

Without CTE/subquery: then I added a condition just return the row which Id = 19.

select  
    id, col1,
    cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
from    
    #t
where
    id = 19;

However, lastval returns null?

With CTE/subquery: now the condition is applied to the CTE:

with t as 
( 
    select   
        id, col1,
        cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
    from     
        #t)
select *
from t
where id = 19;

-- Subquery
select  
    *
from
    (select   
         id, col1,
         cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over (order by id rows between unbounded preceding and 1 preceding), 5, 4) as int) as lastval
     from     
         #t) t
where   
    id = 19;

Now lastval returns -12 as expected?

Upvotes: 2

Views: 1761

Answers (3)

Paul Williams
Paul Williams

Reputation: 17020

The logic order of operations of the SELECT statement is import to understand the results of your first example. From the Microsoft documentation, the order is, from top to bottom:

  1. FROM
  2. ON
  3. JOIN
  4. WHERE
  5. GROUP BY
  6. WITH CUBE or WITH ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

Note that the WHERE clause processing happens logically before the SELECT clause.

The query without the CTE is being filtered where id = 19. The order of operations causes the where to process before the window function in the select clause. There is only 1 row with an id of 19. Therefore, the where limits the rows to id = 19 before the window function can process the rows between unbounded preceding and 1 preceding. Since there are no rows for the window function, the lastval is null.

Compare this to the CTE. The outer query's filter has not yet been applied, so the CTE operates an all of the data. The rows between unbounded preceding finds the prior rows. The outer part of the query applies the filter to the intermediate results returns just the row 19 which already has the correct lastval.

You can think of the CTE as creating a temporary #Table with the CTE data in it. All of the data is logically processed into a separate table before returning data to the outer query. The CTE in your example creates a temporary work table with all of the rows that includes the lastval from the prior rows. Then, the filter in the outer query gets applied and limits the results to id 19.

(In reality, the CTE can shortcut and skip generating data, if it can do so to improve performance without affecting the results. Itzik Ben-Gan has a great example of a CTE that skips processing when it has returned enough data to satisfy the query.)

Consider what happens if you put the filter in the CTE. This should behave exactly like the first example query that you provided. There is only 1 row with an id = 19, so the window function does not find any preceding rows:

with t as ( select id, col1,
            cast(substring(max(cast(id as binary(4)) + cast(col1 as binary(4))) over ( order by id
               rows between unbounded preceding and 1 preceding ), 5, 4) as int) as lastval
               from #t
               where id = 19 -- moved filter inside CTE
             )
    select  *
    from t

Upvotes: 2

Cristhian Valencia
Cristhian Valencia

Reputation: 3

The querys you are comparing are not equivalent.

select  id ,
        (... ) as lastval
from    #t
where   id = 19;

will take only 1 row, so 'lastval' will take NULL from col1 as for the windowed function does not find preceding row.

Upvotes: 0

SqlZim
SqlZim

Reputation: 38023

Window functions operate on your result set, so when you added where id = 19 your result set only had 1 row. Since your window function specifies rows between unbounded preceding and 1 preceding there was no preceding row, and resulted in null.

By using the subquery/cte you are allowing the window function to operate over the unfiltered result set (where the preceding rows exist), then retrieving only those rows from that result set where id = 19.

Upvotes: 0

Related Questions