Reputation: 512
I have this piece of sql code that finds the row_number based on some values in a table __working that joins into a lookup table __Eval
;WITH r
AS (
select
w.uid,
t.TypeId,
--weight
ROW_NUMBER () OVER (PARTITION BY w.uid ORDER BY DIFFERENCE(t.val1, w.val1) + DIFFERENCE(t.val2, w.val2) + DIFFERENCE(t.val3, w.val3) + DIFFERENCE(t.val4, w.val4) DESC) as Score
,w.account
from __Working w
join __eval t on w.val1 like t.val1 and IsNull(w.val4, '') like t.val4 and IsNull(w.val2, '') like t.val2 and IsNull(w.val3, '') like t.val3
)
select * from r where r.account = 1 and score = 1
this returns a typeId = 1
however if I write it like this
;WITH r
AS (
select
w.uid,
t.TypeId,
--weight
ROW_NUMBER () OVER (PARTITION BY w.uid ORDER BY DIFFERENCE(t.val1, w.val1) + DIFFERENCE(t.val2, w.val2) + DIFFERENCE(t.val3, w.val3) + DIFFERENCE(t.val4, w.val4) DESC) as Score
,w.account
from __Working w
join __eval t on w.val1 like t.val1 and IsNull(w.val4, '') like t.val4 and IsNull(w.val2, '') like t.val2 and IsNull(w.val3, '') like t.val3
where r.account = 1
)
select * from r where r.account = 1 and score = 1
it returns TypeId = 2. I would expect that if i had multiple UIDs across different accounts in __working, but I don't. What am i missing here?
Upvotes: 0
Views: 873
Reputation: 1269443
Oh, this is a weirdness of unstable sorts. Your row_number()
expression is:
ROW_NUMBER() OVER (PARTITION BY w.uid
ORDER BY DIFFERENCE(t.val1, w.val1) +
DIFFERENCE(t.val2, w.val2) +
DIFFERENCE(t.val3, w.val3) +
DIFFERENCE(t.val4, w.val4) DESC
) as Score
The problem is that multiple rows have the same value for the ORDER BY
key. Different invocations arbitrary choose which of these multiple rows is first, second, and so on.
The canonical solution is to include some sort of unique key so the sort is stable:
ROW_NUMBER() OVER (PARTITION BY w.uid
ORDER BY (DIFFERENCE(t.val1, w.val1) +
DIFFERENCE(t.val2, w.val2) +
DIFFERENCE(t.val3, w.val3) +
DIFFERENCE(t.val4, w.val4)
) DESC,
?? -- perhaps typeId
) as Score
However, I might suggest a more arduous solution. Accept the fact that ties might exist, and use rank()
and dense_rank()
to identify them. Then, figure out explicitly what to do in the case of a tie -- perhaps all are equally interesting to you or perhaps you have some other method of breaking ties.
Upvotes: 2