Asynchronous
Asynchronous

Reputation: 3977

Where to Set Condition with Where Clause in CTE With Rownumber

I am a little confused as to where would be the preferred place to add the condition when using CTE with ROW_NUMBER OVER PARTITION.

I have a table containing the following columns:

UserID, BranchNumber, MemberDate and MemberStatus

Note: A member can have multiple memberships at different locations:

The following code gives me one less record: 17069

WITH CTE AS
(
SELECT 
 *
,ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY [MemberDate] DESC) AS RowNumber 

FROM MemberTable 

WHERE BranchNumber = '01'
) 
SELECT * FROM CTE WHERE RowNumber = 1 AND MemberStatus = 'Active'

The following code gives one additional record: 17070

WITH CTE AS
(
SELECT 
 *
,ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY [MemberDate] DESC) AS RowNumber 

FROM MemberTable 

WHERE BranchNumber = '01' AND MemberStatus = 'Active'
) 
SELECT * FROM CTE WHERE RowNumber = 1

I am just confused as to why the difference and which is the right way?

The correct amount of records is 19000.

Upvotes: 4

Views: 8436

Answers (4)

M.Ali
M.Ali

Reputation: 69524

Since you have mentioned "A member can have multiple memberships at different locations" , try partition it by the "BranchNumber" as well and see if it gives you the result you are looking for.

The way you are doing it, it is returning the Very last Membership start date for any location, if you partition by the BranchNumber it will return the very last Membership start date for each user for each Branch Number , try something like this

;WITH  
CTE 
AS
  (
   SELECT *
   ,ROW_NUMBER() OVER (PARTITION BY UserID, BranchNumber ORDER BY [MemberDate] DESC)  
                                                                   AS RowNumber 
    FROM MemberTable 

  ) 
SELECT * FROM CTE
WHERE RowNumber = 1 AND MemberStatus = 'Active'

Upvotes: 0

usr
usr

Reputation: 171178

The ROW_NUMBER is evaluated after the WHERE. Let's make it more intuitive by splitting the two apart:

WITH CTE1 AS
(
SELECT 
 *

FROM MemberTable 

WHERE BranchNumber = '01' AND MemberStatus = 'Active'
) 
, CTE2 AS (
 SELECT *, ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY [MemberDate] DESC) AS RowNumber 
  FROM CTE1
)
SELECT * FROM CTE2 WHERE RowNumber = 1

I think it is pretty clear now that you need the filter applied first. Think of the CTE's as virtual tables. You can debug this by changing the final select to SELECT * FROM CTE1. Just look at what is returned.

Upvotes: 1

Christian Phillips
Christian Phillips

Reputation: 18759

The queries are different, so I would expect the records to be different.

The first query says:

  • Get All records where BranchNumber = 1, order by the MemberDate and partition on userId
  • Then, from that result set, get me the 1st record for each user, only if MemberStatus is Active

The second query says:

  • Get me all activemember records from BranchNumber = 1, order by the MemberDate and partition on userId
  • Then, from the result above, get me the first instance.

So, the first result set may be getting the latest record from the user, inactive or not, and then you filter out the inactive ones. The second will only return active records, so I would expect more.

Upvotes: 0

a1ex07
a1ex07

Reputation: 37364

Both are "right" in a sense they return what is asked. (2) gives more records because you apply extra condition (MemberStatus = 'Active') in the subquery(cte). So record where MemberStatus are not equal to "Active" cannot have "RowNumber=1" . (1) doesn't filter such rows in CTE, so it's possible that it returns record[s] with RowNumber=1 and MemberStatus <> 'Active' which got removed from final resultset by applying condition in outer query.

Upvotes: 1

Related Questions