Update behaviour

Question

When I made a mistake in update query text, I spotted unpredictable query result. Here is query text for update.

DECLARE @T TABLE (Id int,[Name] nvarchar(100),RNA int)

INSERT INTO @T(Id,[Name])
SELECT [Id],[Name] 
FROM (VALUES (1,N'D'),
             (2,N'B'),
             (3,N'S'),
             (4,N'A'),
             (5,N'F')
     ) AS vtable([Id],[Name])

UPDATE @T
SET RNA = T.RN
FROM (
  select PP.Name,ROW_NUMBER() OVER(ORDER BY PP.Name) RN,PP.RNA from @T PP
) T

select * from @T

I know where mistake was made:

UPDATE @T

should be

UPDATE T

But why result (with "bad" query) looks like:

Id   Name  RNA
---- ----- -------
1    D     1
2    B     5
3    S     1
4    A     5
5    F     1

I suspect that 1 and 5 values are MIN(Id) and MAX(Id). Execution plan look like:

Will this situation be the same in every situation with this kind of mistake?

If yes, has this behaviour any practical value?

GarethD · Accepted Answer

The situation will not be the same for every kind of mistake. You have a non-determinisitic update statement, that is to say theoritically any of the values for RN in your subquery T could be applied to any of the values in @T. You are essentially running the UPDATE version of this:

SELECT  *
FROM    @t a
        CROSS JOIN
        (   SELECT  TOP 1
                    PP.Name,
                    ROW_NUMBER() OVER(ORDER BY PP.Name) RN,
                    PP.RNA 
            FROM    @T PP
            ORDER BY NEWID()
        ) T
OPTION (FORCE ORDER);

The online manual states:

The results of an UPDATE statement are undefined if the statement includes a FROM clause that is not specified in such a way that only one value is available for each column occurrence that is updated, that is if the UPDATE statement is not deterministic.

What is slightly interesting is that if you run the above you will get a different result each time (barring the 1/25 chance of getting the same result twice in a row), if you remove the random sorting using NEWID() you will get the same value of RN for each row, but the update consistently returns the same results, with 2 different RNs. I am not surprised the result remains consistent with no random ordering because with no changes to the data, and no random factor introduced I would expect the optimiser to come up with the same execution plan no matter how many times it is run.

Since no explicit ordering is specified in your update query, the order is due to the order of the records on the leaf, if the order of the records is altered the result is altered. This can be shown by inserting the records of @T into a new table with different IDs

DECLARE @T2 TABLE (Id int,[Name] nvarchar(100),RNA int);
INSERT @T2 
SELECT  id, Name, NULL
FROM    @T
ORDER BY ROW_NUMBER() OVER(ORDER BY  NEWID())
OPTION (FORCE ORDER);

UPDATE @T2
SET RNA = T.RN
FROM (
  select PP.Name,ROW_NUMBER() OVER(ORDER BY PP.Name) RN,PP.RNA from @T2 PP
) T

SELECT  *
FROM    @T2;

I can see no reason why this is always the min or max value of RN though, I expect you would have to delve deep into the optimiser to find this. Which is probably a new question better suited for the dba stack exchange.

Update behaviour

Answers (1)

Related Questions