praveen
praveen

Reputation: 12291

Passing a table to a stored procedure

I have a table with 20 billion rows. Table does not have any indexes as it was created on fly for doing bulk insert operation. The table is being used in a stored procedure which does the following operation

Delete A
from master a 
inner join (Select distinct Col from TableB ) b
on A.Col = B.Col

Insert into master 
Select *
from tableB
group by col1,col2,col3

TableB is the one which has 20 billion rows. I don't want to execute SP directly because it might take days to complete the execution. Master is also a huge table and has clustered index on Col

  1. Can i pass chunks of rows to the stored procedure and perform the operation.This might reduce the log file growth. If yes how can i do that
  2. Should i create clustered index on the table and execute the SP which might be little faster but then again i think creating CI on a huge table might take 10 hours to complete.

Or is there any way to perform this operation fast

Upvotes: 1

Views: 164

Answers (2)

Bacon Bits
Bacon Bits

Reputation: 32230

I've used a method similar to this one. I'd recommend putting your DB into Bulk Logged recovery mode instead of Full recovery mode if you can.

Blog entry reproduced below to future proof it.

Below is a technique used to transfer a large amount of records from one table to another. This scales pretty well for a couple reasons. First, this will not fill up the entire log prior to committing the transaction. Rather, it will populate the table in chunks of 10,000 records. Second, it’s generally much quicker. You will have to play around with the batch size. Sometimes it’s more efficient at 10,000, sometimes 500,000, depending on the system.

If you do not need to insert into an existing table and just need a copy of the table, it is better to do a SELECT INTO. However for this example, we are inserting into an existing table.

Another trick you should do is to change the recovery model of the database to simple. This way, there will be much less logging in the transaction log.

The WITH (TABLOCK) below only works in SQL 2008.

DECLARE @BatchSize INT = 10000

WHILE 1 = 1
BEGIN

    INSERT INTO [dbo].[Destination] --WITH (TABLOCK)  -- Uncomment for 2008
    (
        FirstName
        ,LastName
        ,EmailAddress
        ,PhoneNumber
    )
    SELECT TOP(@BatchSize) 
        s.FirstName
        ,s.LastName
        ,s.EmailAddress
        ,s.PhoneNumber
    FROM [dbo].[SOURCE] s
    WHERE NOT EXISTS ( 
        SELECT 1
        FROM dbo.Destination
        WHERE PersonID = s.PersonID
    )

    IF @@ROWCOUNT < @BatchSize BREAK

END

With the above example, it is important to have at least a non clustered index on PersonID in both tables.

Another way to transfer records is to use multiple threads. Specifying a range of records as such:

INSERT INTO [dbo].[Destination]
    (
        FirstName
        ,LastName
        ,EmailAddress
        ,PhoneNumber
    )
    SELECT TOP(@BatchSize) 
        s.FirstName
        ,s.LastName
        ,s.EmailAddress
        ,s.PhoneNumber
    FROM [dbo].[SOURCE] s
    WHERE PersonID BETWEEN 1 AND 5000
GO
INSERT INTO [dbo].[Destination]
    (
        FirstName
        ,LastName
        ,EmailAddress
        ,PhoneNumber
    )
    SELECT TOP(@BatchSize) 
        s.FirstName
        ,s.LastName
        ,s.EmailAddress
        ,s.PhoneNumber
    FROM [dbo].[SOURCE] s
    WHERE PersonID BETWEEN 5001 AND 10000

For super fast performance however, I’d recommend using SSIS. Especially in SQL Server 2008. We recently transferred 17 million records in 5 minutes with an SSIS package executed on the same server as the two databases it transferred between.

SQL Server 2008 SQL Server 2008 has made changes with regards to it’s logging mechanism when inserting records. Previously, to do an insert that was minimally logged, you would have to perform a SELECT.. INTO. Now, you can perform a minimally logged insert if you can lock the table you are inserting into. The example below shows an example of this. The exception to this rule is if you have a clustered index on the table AND the table is not empty. If the table is empty and you acquire a table lock and you have a clustered index, it will be minimally logged. However if you have data in the table, the insert will be logged. Now if you have a non clustered index on a heap and you acquire a table lock then only the non clustered index will be logged. It is always better to drop indexes prior to inserting records.

To determine the amount of logging you can use the following statement

 SELECT * FROM ::fn_dblog(NULL, NULL)

Credit for above goes to Derek Dieter at SQL Server Planet.

Upvotes: 3

Peter G
Peter G

Reputation: 1623

If you're dead set on passing a table to your stored procedure, you can pass a table-valued parameter to a stored procedure in SQL Server 2008. You might have better luck with some other approaches suggested, like partitioning. Select distinct on a table with 20 billion rows might be part of the problem. I wonder if some very basic tuning wouldn't help, too:

Delete A
from master a 
where exists (select 1 from TableB b where b.Col = a.Col)

Upvotes: 1

Related Questions