Reputation: 3589
We have a table that contains unique codes. To generate a new unique code we are using an approach found in the below SQL statement and uncovered cases where the NOT EXISTS
statement seemingly allows rows through that exist.
There is no concurrency issues as this was proven out in a sandbox using a single query being ran against SQL Server 2016. If we place the ORDER BY
statement it suddenly works as expected. It appears as if without the ORDER BY
that this query is conditionally ignoring the WHERE
clause. In the event all codes collide I would expect @code
to either be NULL or remain it's initial state of 0.
DECLARE @code int = 0;
select @code = Code from (
SELECT top 1 randoms.Code
FROM (
VALUES
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
) randoms (Code)
WHERE NOT EXISTS (SELECT 1 FROM TEST_Codes uc WHERE uc.Code = randoms.Code)
) c;
SELECT
c.code,
ud.*
FROM (VALUES (@code)) as c(Code)
LEFT OUTER JOIN TEST_Codes ud
ON ud.Code = c.Code
This statement will allow duplicates to be returned, which is baffling due to the WHERE NOT EXISTS
statement.
If we change the definition of the view c
to be ) c ORDER BY c.Code
it suddenly works. Why is this?
Upvotes: 3
Views: 1102
Reputation: 453067
Sql Server does not guarantee how many times it will execute compute scalars and similar expressions. It is possible the reference in the where is using a different value than the one selected but when you add an order by it materialises it and only calculates it once per row.
If you are on 2014 or above you can use an extended events session on query_trace_column_values
to see this happening.
DECLARE @TestCodes TABLE(Code int)
dbcc traceon(2486);
set statistics xml on;
select Code from (
SELECT randoms.Code
FROM (
VALUES
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
) randoms (Code)
WHERE NOT EXISTS (SELECT 1 FROM @TestCodes uc WHERE uc.Code = randoms.Code)
) c
option(recompile);
set statistics xml off;
dbcc traceoff(2486);
The column Union1005
is output from the constant scan at the top right. It is also referenced again in the join predicate. At this point it is re-evaluated and returns a different number.
You may be able to hack around with the query and get it to only be evaluated once but nothing is guaranteed. The only 100% safe way is to materialise the random numbers up front (e.g. into a temp table) before doing the check so you are guaranteed that they aren't going to be recalculated and change under you.
An example of hacking about with the SQL to get a non guaranteed result is below. I would not use this as it has the disadvantages that it it still guarantees nothing and also even if it works if you pick the top 1 off it your "random" numbers will no longer be as well distributed. It introduces a bias for lower numbers.
select Code from (
SELECT TOP 5 randoms.Code
FROM (
VALUES
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
) randoms (Code)
order by Code
) T
WHERE NOT EXISTS (SELECT 1 FROM @TestCodes uc WHERE uc.Code = T.Code)
This materialises it and the value output from the sort is the same as that used in the nested loops predicate.
Upvotes: 4
Reputation: 1298
The Hypothesis
You need a function that creates 5 random values that you want to ensure are not from a table that holds your code values.
We assume your TEST_CODE does not contain duplicates either, and together with the TOP 1 operator the developer can fairly assume his numbers will always be random.
THE PROBLEM
@Martin.Smith makes some good observations about the way this hack...and yes, it is a hack method. Functions do not allow RAND() function, and what you seek is not so much a solution as a workaround. Note their reason for denying NEWID()
Invalid use of side-effecting or time-dependent operator in 'newid()' within a function.
What will Microsoft say? They clearly did not intend to allow Non-deterministic functions so you may not be supported here anyways.
Instead, treat this like the Insert operation that it is. You only care about one random value where the only requirement is that it not match any of the existing codes, whether active or passive.
ADDED - Furthermore, you want to make your code portable, perhaps even in a job. Use a stored procedure instead.
Testing Methods:
CREATE TABLE Test_Codes (Code INT, ud NVARCHAR(50) )
GO
DECLARE @COUNT INT = 1;
WHILE (@Count < 100000)
BEGIN
INSERT INTO Test_Codes (Code, ud)
VALUES (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)
, 'PoliceNumber[' + CAST(@Count AS NVARCHAR(7) ) + ']')
SET @COUNT = @COUNT + 1
END
GO
An Alternative Solution
CREATE PROCEDURE dbo.USP_Code_INS (@Name NVARCHAR(100) )
AS
BEGIN
INSERT INTO Test_Codes (Code, ud)
SELECT TOP 1 B.Code, @Name
FROM Test_Codes A
RIGHT OUTER JOIN ( SELECT randoms.Code
FROM (
VALUES
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT)),
(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))
) randoms (Code) ) B ON B.Code = A.Code
WHERE A.Code IS NULL
END
GO
Note the PREDICATE
here is a simple NULL
comparison. You are guaranteed to get a unique value with a solution that is tabular. If you are concerned by the occasion that no rows will be inserted, then add a fail-safe check (insert operations are expensive).
SELECT TOP 1 B.Code, @Name
FROM Test_Codes A
RIGHT OUTER JOIN (table of NEWID() rows) B
WHERE A.Code IS NULL
You join the tables by any matching rows based on the smallest table (B) and run a predicate that removes any result where Test_Codes has a match. The predicate uses a boolean comparisons of NULL or not NULL, which the optimizer is highly efficient at guessing and sorting through.
This is a set-based solution and fundamentally is how relational languages work.
/* Dynamic Method. Allows to create a job that is simple as a flag to optimize */
CREATE PROCEDURE dbo.USP_Code_INS2 (@BatchNumber INT, @Name NVARCHAR(100) )
AS
BEGIN
DECLARE @BatchNewID NVARCHAR(MAX) = N'(CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))'
, @Count INT = 1
, @SQLStatement NVARCHAR(MAX)
, @ParmDefinitions NVARCHAR(100) = N'@Name NVARCHAR(100)'
/* Dynamically create newid() table to flag-size. */
WHILE @Count < @BatchNumber AND @BatchNumber > 1
BEGIN
SET @BatchNewID = @BatchNewID + N'
, (CAST((abs(CHECKSUM(newid())) % 1000000) AS INT))'
SET @Count = @Count + 1
END
/* Now insert into SQL statement */
SET @SQLStatement = N'INSERT INTO Test_Codes (Code, ud)
SELECT TOP 1 B.Code, @Name
FROM Test_Codes A
RIGHT OUTER JOIN ( SELECT randoms.Code
FROM (
VALUES ' + @BatchNewID
+ N') randoms (Code) ) B ON B.Code = A.Code
WHERE A.Code IS NULL'
--PRINT @SQLStatement
EXEC sp_executesql @SQLStatement, @ParmDefinitions
, @Name = @Name
END
Conclusion
Benefits of Stored Procedures
• Precompiled execution: SQL Server compiles each Stored Procedure once and then re utilizes the execution plan. This results in tremendous performance boosts when Stored Procedures are called repeatedly.
• Reduced client/server traffic: If network bandwidth is a concern in your environment then you’ll be happy to learn that Stored Procedures can reduce long SQL queries to a single line that is transmitted over the wire.
• Efficient reuse of code and programming abstraction: Stored Procedures can be used by multiple users and client programs. If you utilize them in a planned manner then you’ll find the development cycle requires less time.
• Enhanced security controls: You can grant users permission to execute a Stored Procedure independently of underlying table
Upvotes: 1