SELECT TOP 1 without ORDER BY Issue ignores WHERE statement

Question

We have a table that contains unique codes. To generate a new unique code we are using an approach found in the below SQL statement and uncovered cases where the NOT EXISTS statement seemingly allows rows through that exist.

There is no concurrency issues as this was proven out in a sandbox using a single query being ran against SQL Server 2016. If we place the ORDER BY statement it suddenly works as expected. It appears as if without the ORDER BY that this query is conditionally ignoring the WHERE clause. In the event all codes collide I would expect @code to either be NULL or remain it's initial state of 0.

DECLARE @code int = 0;

    select  @code = Code from (
        SELECT top 1 randoms.Code
        FROM (
            VALUES 
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
        ) randoms (Code)
        WHERE NOT EXISTS (SELECT 1 FROM TEST_Codes uc WHERE uc.Code = randoms.Code)
    ) c;


    SELECT 
        c.code,
        ud.*
    FROM (VALUES (@code)) as c(Code)
    LEFT OUTER JOIN TEST_Codes ud
        ON ud.Code = c.Code

This statement will allow duplicates to be returned, which is baffling due to the WHERE NOT EXISTS statement.

If we change the definition of the view c to be ) c ORDER BY c.Code it suddenly works. Why is this?

Martin Smith · Accepted Answer

Sql Server does not guarantee how many times it will execute compute scalars and similar expressions. It is possible the reference in the where is using a different value than the one selected but when you add an order by it materialises it and only calculates it once per row.

If you are on 2014 or above you can use an extended events session on query_trace_column_values to see this happening.

DECLARE @TestCodes TABLE(Code int)
dbcc traceon(2486);
set statistics xml on;

    select  Code from (
        SELECT randoms.Code
        FROM (
            VALUES 
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
            (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
        ) randoms (Code)
        WHERE NOT EXISTS (SELECT 1 FROM @TestCodes uc WHERE uc.Code = randoms.Code)
    ) c
     option(recompile);


set statistics xml off;
dbcc traceoff(2486);

The column Union1005 is output from the constant scan at the top right. It is also referenced again in the join predicate. At this point it is re-evaluated and returns a different number.

You may be able to hack around with the query and get it to only be evaluated once but nothing is guaranteed. The only 100% safe way is to materialise the random numbers up front (e.g. into a temp table) before doing the check so you are guaranteed that they aren't going to be recalculated and change under you.

An example of hacking about with the SQL to get a non guaranteed result is below. I would not use this as it has the disadvantages that it it still guarantees nothing and also even if it works if you pick the top 1 off it your "random" numbers will no longer be as well distributed. It introduces a bias for lower numbers.

select  Code from (
    SELECT TOP 5 randoms.Code
    FROM (
        VALUES 
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
        (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
    ) randoms (Code)
    order by Code
    ) T
    WHERE NOT EXISTS (SELECT 1 FROM @TestCodes uc WHERE uc.Code = T.Code)

This materialises it and the value output from the sort is the same as that used in the nested loops predicate.

SELECT TOP 1 without ORDER BY Issue ignores WHERE statement

Answers (2)

Related Questions