nikolifish
nikolifish

Reputation: 512

row_number with partition value changing

I have this piece of sql code that finds the row_number based on some values in a table __working that joins into a lookup table __Eval

;WITH r 
 AS (
    select 

    w.uid,
    t.TypeId,
    --weight
    ROW_NUMBER () OVER (PARTITION BY w.uid ORDER BY  DIFFERENCE(t.val1, w.val1) +  DIFFERENCE(t.val2, w.val2) + DIFFERENCE(t.val3, w.val3) + DIFFERENCE(t.val4, w.val4)  DESC) as Score
    ,w.account

    from __Working w
        join __eval t on w.val1 like t.val1 and IsNull(w.val4, '') like t.val4 and IsNull(w.val2, '') like t.val2 and IsNull(w.val3, '') like t.val3

)

select * from r     where   r.account = 1 and score = 1

this returns a typeId = 1

however if I write it like this

   ;WITH r 
     AS (
        select 

        w.uid,
        t.TypeId,
        --weight
        ROW_NUMBER () OVER (PARTITION BY w.uid ORDER BY  DIFFERENCE(t.val1, w.val1) +  DIFFERENCE(t.val2, w.val2) + DIFFERENCE(t.val3, w.val3) + DIFFERENCE(t.val4, w.val4)  DESC) as Score
        ,w.account

        from __Working w
            join __eval t on w.val1 like t.val1 and IsNull(w.val4, '') like t.val4 and IsNull(w.val2, '') like t.val2 and IsNull(w.val3, '') like t.val3
            where   r.account = 1
    )

    select * from r     where   r.account = 1 and score = 1

it returns TypeId = 2. I would expect that if i had multiple UIDs across different accounts in __working, but I don't. What am i missing here?

Upvotes: 0

Views: 873

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269443

Oh, this is a weirdness of unstable sorts. Your row_number() expression is:

 ROW_NUMBER() OVER (PARTITION BY w.uid
                    ORDER BY  DIFFERENCE(t.val1, w.val1) +
                              DIFFERENCE(t.val2, w.val2) +
                              DIFFERENCE(t.val3, w.val3) +
                              DIFFERENCE(t.val4, w.val4)  DESC
                   ) as Score

The problem is that multiple rows have the same value for the ORDER BY key. Different invocations arbitrary choose which of these multiple rows is first, second, and so on.

The canonical solution is to include some sort of unique key so the sort is stable:

 ROW_NUMBER() OVER (PARTITION BY w.uid
                    ORDER BY  (DIFFERENCE(t.val1, w.val1) +
                               DIFFERENCE(t.val2, w.val2) +
                               DIFFERENCE(t.val3, w.val3) +
                               DIFFERENCE(t.val4, w.val4)
                              ) DESC,
                              ??  -- perhaps typeId
                   ) as Score

However, I might suggest a more arduous solution. Accept the fact that ties might exist, and use rank() and dense_rank() to identify them. Then, figure out explicitly what to do in the case of a tie -- perhaps all are equally interesting to you or perhaps you have some other method of breaking ties.

Upvotes: 2

Related Questions