Matan L
Matan L

Reputation: 1027

Efficient SQL Server stored procedure

I am using SQL Server 2008 and running the following stored procedure that needs to "clean" a 70 mill table from about 50 mill rows to another table, the id_col is integer (primary identity key)

According to the last running I made it is working good but it is expected to last for about 200 days:

SET NOCOUNT ON

    -- define the last ID handled
    DECLARE @LastID integer
    SET @LastID = 0
    declare @tempDate datetime
    set @tempDate = dateadd(dd,-20,getdate())
    -- define the ID to be handled now
    DECLARE @IDToHandle integer
    DECLARE @iCounter integer
    DECLARE @watch1 nvarchar(50)
    DECLARE @watch2 nvarchar(50)
    set @iCounter = 0
    -- select the next  to handle    
    SELECT TOP 1 @IDToHandle = id_col
    FROM MAIN_TABLE
    WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
        and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
    ORDER BY id_col

    -- as long as we have s......    
    WHILE @IDToHandle IS NOT NULL
    BEGIN
        IF ((select count(1) from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS where some_int_col = @IDToHandle) = 0 and (select count(1) from A_70k_rows_table where some_int_col =@IDToHandle )=0)
        BEGIN
            INSERT INTO SECONDERY_TABLE
            SELECT col1,col2,col3.....
            FROM MAIN_TABLE WHERE id_col = @IDToHandle

            EXEC    [dbo].[DeleteByID] @ID = @IDToHandle --deletes the row from 2 other tables that is related to the MAIN_TABLE and than from the MAIN_TABLE
            set @iCounter = @iCounter +1
        END
        IF (@iCounter % 1000 = 0)
        begin
            set @watch1 = 'iCounter - ' + CAST(@iCounter AS VARCHAR)
            set @watch2 = 'IDToHandle - '+ CAST(@IDToHandle AS VARCHAR)
            raiserror ( @watch1, 10,1) with nowait
            raiserror (@watch2, 10,1) with nowait
        end
        -- set the last  handled to the one we just handled
        SET @LastID = @IDToHandle
        SET @IDToHandle = NULL

        -- select the next  to handle    
        SELECT TOP 1 @IDToHandle = id_col
        FROM MAIN_TABLE
        WHERE id_col> @LastID and DATEDIFF(DD,someDateCol,otherDateCol) < 1
            and datediff(dd,someDateCol,@tempDate) > 0 and (some_other_int_col = 1745 or some_other_int_col = 1548 or some_other_int_col = 4785)
        ORDER BY id_col
    END

Any ideas or directions to improve this procedure run-time will be welcomed

Upvotes: 1

Views: 483

Answers (1)

Charles Bretana
Charles Bretana

Reputation: 146499

Yes, try this:

Declare @Ids Table (id int Primary Key not Null)
Insert @Ids(id)
Select id_col
From MAIN_TABLE m
Where someDateCol >= otherDateCol
    And someDateCol < @tempDate -- If there are times in these datetime fields, 
                                -- then you may need to modify this condition.
    And some_other_int_col In (1745, 1548, 4785)
    And Not exists (Select * from SOME_OTHER_TABLE_THAT_CONTAINS_20k_ROWS
                    Where some_int_col = m.id_col)
    And Not Exists (Select * From A_70k_rows_table
                    Where some_int_col = m.id_col)
Select id from @Ids  -- this to confirm above code generates the correct list of Ids
return -- this line to stop (Not do insert/deletes) until you have verified @Ids is correct
-- Once you have verified that above @Ids is correctly populated, 
-- then delete or comment out the select and return lines above so insert runs.

      Begin Transaction
      Delete OT     -- eliminate row-by-row call to second stored proc
      From OtherTable ot
         Join MAIN_TABLE m On m.id_col = ot.FKCol
         Join @Ids i On i.Id = m.id_col 

      Insert SECONDERY_TABLE(col1, col2, etc.)
      Select col1,col2,col3.....
      FROM MAIN_TABLE m Join @Ids i On i.Id = m.id_col 

      Delete m   -- eliminate row-by-row call to second stored proc
      FROM MAIN_TABLE m 
      Join @Ids i On i.Id = m.id_col 

      Commit Transaction

Explaanation.

  1. You had numerous filtering conditions that were not SARGable, i.e., they would force a complete table scan for every iteration of your loop, instead of being able to use any existing index. Always try to avoid filter conditions that apply processing logic to a table column value before comparing it to some other value. This eliminates the opportunity for the query optimizer to use an index.

  2. You were executing the inserts one at a time... Way better to generate a list of PK Ids that need to be processed (all at once) and then do all the inserts at once, in one statement.

Upvotes: 3

Related Questions