Brandon McClure
Brandon McClure

Reputation: 1390

Why does ROW_NUMBER in a view not respect filters

I am using SQL Server 2014.

I have created a view which surfaces patient history answers such as tabaco use, alcohol use, etc. These answers are time stamped but not linked to the appointment identifiers, and I need to find the most recent answer relative to the appointment date.

The data the view surfaces looks like this:

PATIENT_ID | ANSWER_DATE | TOBACCO_USE
 1         |  1/1/2018   |  No
 1         |  1/5/2018   |  Yes
 1         |  1/10/2018  |  Quit
 2         |  1/1/2018   |  No

I know I can use ROW_NUMBER() in a inline query when I join to this table to get the ranking I need, but I really want to add ROW_NUMBER()OVER(PARTITION BY PATIENT_ID ORDER BY ANSWER_DATE DESC) as 'rnkDesc' column to the view, to make it simpler for other developers to properly join to this table.

With this new column a SELECT * from the view looks like this:

PATIENT_ID | ANSWER_DATE | TOBACCO_USE | rnkDesc
 1         |  1/1/2018   |  No         |  3
 1         |  1/5/2018   |  Yes        |  2
 1         |  1/10/2018  |  Quit       |  1
 2         |  1/1/2018   |  No         |  1

That is as expected, now I join from my appointments table like so:

FROM APPOINTMENTS appt 
LEFT JOIN myHistoryView his 
on appt.PATIENT_ID = his.PATIENT_ID 
    and his.ANSWER_DATE <= APPT.APPT_DATE
    and his.rnkDesc = 1

This does not work though, as it appears like the ROW_NUMBER is evaluated before the filters are applied. If I filter my view where PATIENT_ID = 1 and ANSWER_DATE = 1/5/2018 then rnkDesc still shows 2, instead of 1 like it would if I was using ROW_NUMBER in an inline query.

I am really interested in why this behaves this way. I can code around it by using an inline query. I know that these ranking functions are nondeterministic, and would have thought the engine would filter the result set in the view before it generates the ROW_NUMBER. I tried this with RANK and DENSE_RANK as well, at it appears that these also behave the same way. (determined before the filters are applied.)

Upvotes: 1

Views: 1321

Answers (2)

Steve
Steve

Reputation: 960

It's important to understand that there is a clear, indeed deterministic, order of evaluation in SQL.

The ROW_NUMBER on the view has already been evaluated before the left-join occurs. The ON clause does not act as a "filter" on the table to which the join refers, but as a condition that must be met for a join between the tables to occur.

You could of course create a "parameterised view" (a table-valued inline function), which allows you to pass in the filter date to a where-clause before the ROW_NUMBER is applied in the select-clause, and then outer-apply onto it. That may be appropriate if a large number of queries use the same fuctionality.

But otherwise I'd be inclined to leave the "substance use history" view unadulterated with any row-numbering (unless it is used independently by other queries to get the absolute latest answer), and write the "the latest row on or before the current appointment" logic inline.

Upvotes: 2

AaronLS
AaronLS

Reputation: 38365

If you implemented this with CTEs or a subquery, you'd see the same results. This is necessary because sometimes you need to generate a rank on a result, and then have that rank be unchanged by outer queries. So it is as if the rank is generated first as part of the subquery, and then it is "locked in" so you can filter the results based on that row number.

Let's imagine if one of your filters in the outer query was actually rnkDesc = 2, which is a way that sometimes you can do things like get "2nd most". Imagine if the row number was not generated until after the outer query filter was generated, this would make this type of approach impossible. How do you filter the results on the value of something that hasn't been determined yet? This is the same reason that filtering on a window function usually requires first nesting a subquery or a CTE, so you can filter on the generated results, and those results don't get renumbered/ranked in the outer dynamically.

Therefore it makes sense to lock-in windowed function results based on the nesting they occur in. You have to kind of think about this kind of nesting in terms of subresults.

So that answers your question of "Why" it is this way. I understand what you're trying to achieve though and why you want to do this. You're trying to simplify the use of the window function and have it apply dynamically to the final result. I'm honestly not sure "how" you'd do this without just including the window function in the outer query. There may be a way to embed it in a UDF, but I'm not sure.

Upvotes: 2

Related Questions