Reputation: 1
Problem is described below:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Sample Output: 445rpxlKYPkj1pg4q8nAy7Ab91zxZ8v1
I can do this using Java, but will greatly appreciate if you could help me do this on MS SQL or T-SQL.
Upvotes: 0
Views: 5562
Reputation: 17915
I made this generic enough to handle any pool of characters and any output length. The core idea is to take a random sequence of bytes and use a base conversion algorithm to convert a long number into a new representation then translated to a string using your desired characters as its "digits".
For your specific scenario we need about 183 bits, or log2(52) x 32, to get to your desired length. Using newid()
will generate the unique bit sequence but it will only do so 128 bits at a time and a series of values is simply concatenated until there are enough. Then having a value to operate on, the main loop is essentially the same long division we learned from elementary school. The intermediate calculations are kept in place in the varbinary
array and the loop continues only until enough output characters are obtained. Each iteration determines another low order digit in the new base and this can terminate early since they won't change. The algorithm can't guarantee any global uniqueness if the output doesn't consume at least all of one newid()
, so make sure log2(len(pool)) x output length is at least 128.
The target base, which is ultimately the length of the character pool, can't be more than 256. I hard-coded a limitation by setting the 128-byte maximum length of @e
. For the question @e
only needs to be 32 bytes long and it could be adjusted upward or downward as necessary or just defined as varbinary(max)
. If you need something more truly random you could find another source for the entropy bits like crypt_gen_random()
. Since uniqueness appears to be the primary concern this answer fits that requirement. And by the way, repeating characters in the pool will naturally open the door for collisions.
This is fast and generic and it can be easily wrapped up in a function. And a more robust implementation would handle these extra checks.
declare @characterPool varchar(256) =
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
declare @outputLength int = 32;
declare @n int = 0; /* counter */
declare @numLoops int = ceiling(log(len(@characterPool)) / log(2) * @outputLength / 128)
declare @e varbinary(128) = 0x; /* entropy */
while @n < @numLoops
begin
set @e = cast(newid() as binary(16)); set @n += 1;
end
declare @b int; /* byte */
declare @d int; /* dividend */
declare @out varchar(128) = '';
declare @outputBase int = len(@characterPool);
declare @entropyBytes int = len(@e);
declare @m int = 0;
while @m < @outputLength
begin
set @b = 0; set @d = 0; set @n = 0;
while @n < @entropyBytes /* big-endian */
begin
set @b = (@b - @d * @outputBase) * 256 + cast(substring(@e, @n + 1, 1) as int);
set @d = @b / @outputBase;
set @e = cast(stuff(@e, @n + 1, 1, cast(@d as binary(1))) as varbinary(128));
set @n += 1;
end
set @out = substring(@characterPool, @b - @d * @outputBase + 1, 1) + @out;
set @m += 1;
end
select @out as "UniqueString"
http://rextester.com/EYAK79470
As one simple test of the algorithm you could just assign a known value in hexadecimal format and confirm that the output (using 012345678ABCDEF
as the character pool) is the same hexadecimal value. In the same way this obviously works with base64, binary and octal.
Update: The main loop can be made faster by not having to iterate over more bytes than necessary. I don't know how crypt_gen_random()
compares to newid()
in terms of speed or CPU usage so this change might not even be a net positive so I'll just note it as an alternative to explore. You will want to keep the bytes from newid
on the little end and attach the rest to the front.
declare @e varbinary(1024) = cast(newid() as binary(16));
declare @padBytes int = ceiling(log(len(@characterPool)) / log(2) * @outputLength) - 128;
if @padBytes > 0 set @e = crypt_gen_random(@padBytes) + @e; /* big end plus little end */
Upvotes: 1
Reputation: 31879
First, you need to split the string into separate rows. Then, do a SELECT
with ORDER BY NEWID()
for the random sort. Finally, use FOR XML PATH('')
to concatenate them back:
DECLARE @str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 10 ^ 1 = 10 rows
SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)
),
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
CteTally(N) AS(
SELECT TOP(LEN(@str)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E4
)
SELECT (
SELECT TOP(32)
SUBSTRING(@str, N, 1)
FROM CteTally t
ORDER BY NEWID()
FOR XML PATH('')
) AS Result
The above is more of a generic random string generator. You can modify it to suit your need. If the requirement will not change, you can simply use this:
DECLARE @str VARCHAR(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
;WITH E1(N) AS( -- 52 Rows
SELECT 1 FROM( VALUES
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),
(1),(1)
)t(N)
),
CteTally(N) AS(
SELECT ROW_NUMBER() OVER(ORDER BY(SELECT NULL))
FROM E1
)
SELECT (
SELECT TOP(32)
SUBSTRING(@str, N, 1)
FROM CteTally t
ORDER BY NEWID()
FOR XML PATH('')
) AS Result
Upvotes: 4