Reputation: 55
So I currently have a database table of about 70,000 names. What I want to do is take 3000 random records from that database and insert them into another table where each name has a row for all the other names. In other words, the new table should look like this:
John, jerry
john, alex
john, sam
jerry, alex
jerry, sam
alex, sam
This means that I should be adding summation n rows to the table. My current strategy is to use two nested for loops to add these rows one at a time and then removing the first name from the list of names to add in order to ensure I dont have a duplicate record with different ordering.
My question is this: is there a faster way to do this, perhaps through parallel for loops or PLINQ or some other option that I a have not mentioned?
Upvotes: 0
Views: 929
Reputation: 2880
Using a Number table to simulate names.
single query, using a triangular join
WITH all_names
AS (SELECT n,
'NAME_' + Cast(n AS VARCHAR(20)) NAME
FROM number
WHERE n < 70000),
rand_names
AS (SELECT TOP 3000 *
FROM all_names
ORDER BY Newid()),
ordered_names
AS (SELECT Row_number()
OVER (
ORDER BY NAME) rw_num,
NAME
FROM rand_names)
SELECT n1.NAME,
n2.NAME
FROM ordered_names n1
INNER JOIN ordered_names n2
ON n2.rw_num > n1.rw_num
Upvotes: 0
Reputation: 45096
You will need to figure out the random part
select t1.name, t2.name
from table t1
join table t2
on t1.name < t2.name
order by t1.name, t2.name
You need to materialize the newid
declare @t table (name varchar(10) primary key);
insert into @t (name) values
('Adam')
, ('Bob')
, ('Charlie')
, ('Den')
, ('Eric')
, ('Fred');
declare @top table (name varchar(10) primary key);
insert into @top (name)
select top (4) name from @t order by NEWID();
select * from @top;
select a.name, b.name
from @top a
join @top b
on a.name < b.name
order by a.name, b.name;
Upvotes: 1
Reputation: 25013
Given a table "Names" with an nvarchar(50) column "Name" with this data:
Adam
Bob
Charlie
Den
Eric
Fred
This query:
-- Work out the fraction we need
DECLARE @frac AS float;
SELECT @frac = CAST(35000 AS float) / 70000;
-- Get roughly that sample size
WITH ts AS (
SELECT Name FROM Names
WHERE @frac >= CAST(CHECKSUM(NEWID(), Name) & 0x7FFFFFFF AS float) / CAST (0X7FFFFFFF AS int)
)
-- Match each entry in the sample with all the other entries
SELECT x.Name + ', ' + y.Name
FROM ts AS X
CROSS JOIN
Names AS Y
WHERE x.Name <> y.Name
produces results of the form
Adam, Bob
Adam, Charlie
Adam, Den
Adam, Eric
Adam, Fred
Charlie, Adam
Charlie, Bob
Charlie, Den
Charlie, Eric
Charlie, Fred
Den, Adam
Den, Bob
Den, Charlie
Den, Eric
Den, Fred
The results will vary by run; a sample of 3000 out of 70000 will have approximately 3000 * 70000 result rows. I used 35000./70000 because the sample size I used was only 6.
If you want only the names from the sample used, change CROSS JOIN Names AS Y
to CROSS JOIN ts AS Y
, and there will then be approximately 3000 * 3000 result rows.
Reference: The random sample method was taken from the section "Important" in Limiting Result Sets by Using TABLESAMPLE.
Upvotes: 1