Reputation: 6402
I need to select 10 random elements from a table.I know how to do that, else this question have been answered a million times here on SO. By my problem is the randomisation isn't good enough
I've sat up the test case showing my problem :
DECLARE @Random TABLE
(
Id int,
[Count] int
)
DECLARE @TestData TABLE
(
Id int
)
declare @runs int = 0;
WHILE (@runs <=800)
begin
insert into @TestData values(@runs)
set @runs = @runs +1
end;
set @runs = 0
WHILE (@runs <=100)
begin
MERGE @Random AS target
-- USING (SELECT ID FROM @TestData where 0.01 >= CAST(CHECKSUM(NEWID(), id) & 0x7fffffff AS float) / CAST (0x7fffffff AS int) )
-- USING (SELECT top 10 ID FROM @TestData order by newid())
USING (SELECT top 10 ID FROM @TestData order by abs(checksum(newid())) % 100)
AS SOURCE
ON (target.id = source.id)
WHEN MATCHED THEN
UPDATE SET Target.[Count] = Target.[Count] + 1
WHEN NOT MATCHED THEN
INSERT (ID, [Count]) VALUES (source.ID, 1);
set @runs = @runs +1
end
select [count], count(*) "count(*)" from @Random group by [count] order by 1 desc
As you can see I've tried several methods in order of randomisation. But every time I end up with a result like this :
So in short how do I select really random elements from a table?
Scope: SQL Server 2017, so every language feature is acceptable
Upvotes: 0
Views: 108
Reputation: 22743
The issue is with your output query I believe, although this answer doesn't verify the randomness it should show that it's pretty random.
Firstly, don't use Keywords such as COUNT
as column names if you can help it. That's what has confused your output here.
Run this sample with 10000 runs and you should get a random set or results, but I'm not claiming it's fully randomised:
DECLARE @Random TABLE
(
Id INT ,
Occurences INT
);
DECLARE @TestData TABLE
(
Id INT
);
DECLARE @runs INT = 0;
WHILE ( @runs <= 800 )
BEGIN
INSERT INTO @TestData
VALUES ( @runs );
SET @runs = @runs + 1;
END;
SET @runs = 0;
WHILE ( @runs <= 10000 )
BEGIN
MERGE @Random AS target
USING ( SELECT TOP 10 Id
FROM @TestData
ORDER BY ABS(CHECKSUM(NEWID())) % 100 ) AS SOURCE
ON ( target.Id = SOURCE.Id )
WHEN MATCHED THEN
UPDATE SET target.Occurences = target.Occurences + 1
WHEN NOT MATCHED THEN INSERT ( Id ,
Occurences )
VALUES ( SOURCE.Id, 1 );
SET @runs = @runs + 1;
END;
SELECT Id ,
Occurences
FROM @Random
ORDER BY Id;
NOTE: this should help you further your investigation but is not proof of the randomness. Further testing should be carried out.
Upvotes: 3