user3416894
user3416894

Reputation: 55

Faster way to insert records into database

So I currently have a database table of about 70,000 names. What I want to do is take 3000 random records from that database and insert them into another table where each name has a row for all the other names. In other words, the new table should look like this:

John, jerry
john, alex
john, sam
jerry, alex
jerry, sam
alex, sam

This means that I should be adding summation n rows to the table. My current strategy is to use two nested for loops to add these rows one at a time and then removing the first name from the list of names to add in order to ensure I dont have a duplicate record with different ordering.

My question is this: is there a faster way to do this, perhaps through parallel for loops or PLINQ or some other option that I a have not mentioned?

Upvotes: 0

Views: 929

Answers (3)

StrayCatDBA
StrayCatDBA

Reputation: 2880

Using a Number table to simulate names.

single query, using a triangular join

WITH all_names 
     AS (SELECT n, 
                'NAME_' + Cast(n AS VARCHAR(20)) NAME 
         FROM   number 
         WHERE  n < 70000), 
     rand_names 
     AS (SELECT TOP 3000 * 
         FROM   all_names 
         ORDER  BY Newid()), 
     ordered_names 
     AS (SELECT Row_number() 
                  OVER ( 
                    ORDER BY NAME) rw_num, 
                NAME 
         FROM   rand_names) 
SELECT n1.NAME, 
       n2.NAME 
FROM   ordered_names n1 
       INNER JOIN ordered_names n2 
               ON n2.rw_num > n1.rw_num   

Upvotes: 0

paparazzo
paparazzo

Reputation: 45096

You will need to figure out the random part

select t1.name, t2.name 
from table t1 
join table t2 
on t1.name < t2.name 
order by t1.name, t2.name

You need to materialize the newid

declare @t table (name varchar(10) primary key);
insert into @t (name) values 
       ('Adam')
     , ('Bob')
     , ('Charlie')
     , ('Den')
     , ('Eric')
     , ('Fred');
declare @top table (name varchar(10) primary key);
insert into @top (name)
select top (4) name from @t order by NEWID();

select * from @top;

select a.name, b.name
from @top a  
join @top b 
  on a.name < b.name  
order by a.name, b.name;

Upvotes: 1

Andrew Morton
Andrew Morton

Reputation: 25013

Given a table "Names" with an nvarchar(50) column "Name" with this data:

Adam
Bob
Charlie
Den
Eric
Fred

This query:

-- Work out the fraction we need
DECLARE @frac AS float;
SELECT @frac = CAST(35000 AS float) / 70000;

-- Get roughly that sample size
WITH ts AS (
SELECT Name FROM Names
WHERE @frac >= CAST(CHECKSUM(NEWID(), Name) & 0x7FFFFFFF AS float) / CAST (0X7FFFFFFF AS int)
)

-- Match each entry in the sample with all the other entries
SELECT x.Name + ', ' + y.Name
FROM ts AS X
CROSS JOIN
Names AS Y
WHERE x.Name <> y.Name

produces results of the form

Adam, Bob
Adam, Charlie
Adam, Den
Adam, Eric
Adam, Fred
Charlie, Adam
Charlie, Bob
Charlie, Den
Charlie, Eric
Charlie, Fred
Den, Adam
Den, Bob
Den, Charlie
Den, Eric
Den, Fred

The results will vary by run; a sample of 3000 out of 70000 will have approximately 3000 * 70000 result rows. I used 35000./70000 because the sample size I used was only 6.

If you want only the names from the sample used, change CROSS JOIN Names AS Y to CROSS JOIN ts AS Y, and there will then be approximately 3000 * 3000 result rows.

Reference: The random sample method was taken from the section "Important" in Limiting Result Sets by Using TABLESAMPLE.

Upvotes: 1

Related Questions