dk96m
dk96m

Reputation: 331

SQL Server cursor replacement (Update from another question)

This is sort of a follow up question I asked here: SQL Server better way to iterate through millions of rows. I was told in that thread to submit a new question because what I really was looking for was different than the way I asked it.

I was not initially clear in my OP that this procedure will be run periodically. So this will not be run one time, it will be run weekly and the procedure needs to be able to check the counter table and the temp table.

I am working with SAP Timesheet data, so there are millions of rows. What I am trying to do is select the data from the SAP table and insert it into a table on MS SQL Server.

So I want to insert the original record(if it doesnt already exist), then if an update to the original record happens, which is in the form of a new SAP record with a refcounter, I want to find the original record in my table and update it, keeping the original counter value.

So I have done this successfully with a cursor (I know not the best), but with millions of records, I am wondering if there is a faster way, because I am on day 4 of my cursor running. Is there a better way then what I have below:

BEGIN
    CREATE TABLE CATSDB 
        (
            [COUNTER] nvarchar(12),
            REFCOUNTER nvarchar(12),
            PERNR nvarchar(8),
            WORKDATE nvarchar(8),
            CATSHOURS decimal(7, 3),
            APDAT nvarchar(8),
            LAETM nvarchar(6),
            CATS_STATUS nvarchar(2),
            APPR_STATUS nvarchar(2)
        )   

    INSERT INTO CATSDB
            (
                [COUNTER],REFCOUNTER,PERNR,WORKDATE,CATSHOURS,APDAT,LAETM,CATS_STATUS,APPR_STATUS
            )
        VALUES
            ('000421692670',NULL,'00000071','20190114','6.00','20190204','174541','30','30'),
            ('000421692671',NULL,'00000071','20190114','3.00','20190204','174541','30','30'),
            ('000421692672',NULL,'00000071','20190115','6.00','00000000','000000','60','20'),
            ('000421692673',NULL,'00000071','20190115','3.00','00000000','000000','60','20'),
            ('000421692712','000421692672','00000071','20190115','0.00','20190115','111007','30','30'),
            ('000421692713','000421692673','00000071','20190115','0.00','20190115','111007','30','30'),
            ('000429718015',NULL,'00000072','20190313','7.00','00000000','000000','60','20'),
            ('000429718016',NULL,'00000072','20190313','1.50','20190315','164659','30','30'),
            ('000429718017',NULL,'00000072','20190313','1.00','20190315','164659','30','30'),
            ('000430154143',NULL,'00000072','20190313','2.00','00000000','000000','60','20'),
            ('000430154142','000429718015','00000072','20190313','5.00','00000000','000000','60','20'),
            ('000430154928','000430154142','00000072','20190313','4.50','20190315','164659','30','30'),
            ('000430154929','000430154143','00000072','20190313','2.50','20190315','164659','30','30'),
            ('000429774620',NULL,'00000152','20190314','1.00','00000000','000000','60','20'),
            ('000429774619',NULL,'00000152','20190314','1.00','00000000','000000','60','20'),
            ('000429802106','000429774620','00000152','20190314','2.00','00000000','000000','60','20'),
            ('000429802105','000429774619','00000152','20190314','3.00','00000000','000000','60','20'),
            ('000429840242','000429802106','00000152','20190314','4.00','20190315','143857','30','30'),
            ('000429840241','000429802105','00000152','20190314','5.00','20190315','143857','30','30')

    CREATE TABLE [TBL_COUNTER]
        (
            [COUNTER] [varchar](12) NOT NULL,
            [REFCOUNTER] [varchar](12) NULL
        )   

    CREATE TABLE TEMP
        (
            [COUNTER] [nvarchar](12) NOT NULL,
            [REFCOUNTER] [nvarchar](12) NULL,
            [PERNR] [nvarchar](8) NULL,
            [WORKDATE] [nvarchar](8) NULL,
            [CATSHOURS] [decimal](7, 3) NULL,
            [APDAT] [nvarchar](8) NULL,
            [LAETM] [nvarchar](6) NULL,
            [CATS_STATUS] [nvarchar](2) NULL,
            [APPR_STATUS] [nvarchar](2) NULL
        )       
END

BEGIN
    DECLARE     @COUNTER nvarchar(12),  
                @REFCOUNTER nvarchar(12),   
                @PERNR nvarchar(8), 
                @WORKDATE nvarchar(8),  
                @CATSHOURS decimal(7, 3),
                @APDAT nvarchar(8),
                @LAETM nvarchar(6),
                @CATS_STATUS nvarchar(2),
                @APPR_STATUS nvarchar(2)

    DECLARE @orig_counter nvarchar(12)
END

--The below cursor will include a where statement where it selects data
--where the workdate or apdat fall within a data range.  Since I am
--running this weekly, the to and from dates will be the last work week.
BEGIN
    DECLARE curs CURSOR FOR
        SELECT 
                [COUNTER],
                REFCOUNTER,
                PERNR,
                WORKDATE,
                CATSHOURS,
                APDAT,
                LAETM,
                CATS_STATUS,
                APPR_STATUS
        FROM 
                CATSDB
END

BEGIN
    OPEN curs
END

BEGIN
    FETCH NEXT FROM curs INTO
        @COUNTER,
        @REFCOUNTER,
        @PERNR,
        @WORKDATE,
        @CATSHOURS,
        @APDAT,
        @LAETM,
        @CATS_STATUS,
        @APPR_STATUS
END

BEGIN
    WHILE @@FETCH_STATUS = 0
        BEGIN
            BEGIN
                IF NOT EXISTS (SELECT * FROM TBL_COUNTER WHERE [COUNTER] = @COUNTER)
                    BEGIN
                        INSERT INTO TBL_COUNTER
                                ([COUNTER]
                                ,REFCOUNTER)
                            VALUES
                                (@COUNTER
                                ,@REFCOUNTER)
                    END
            END
            BEGIN
                IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = @COUNTER)
                    BEGIN
                            --If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record
                            IF @REFCOUNTER <> '' AND @REFCOUNTER IS NOT NULL
                                BEGIN
                                    BEGIN
                                        WITH n([COUNTER], REFCOUNTER) AS 
                                            (
                                                SELECT 
                                                        cnt.[COUNTER], 
                                                        cnt.REFCOUNTER 
                                                FROM 
                                                        TBL_COUNTER cnt
                                                WHERE 
                                                        cnt.[COUNTER] = @REFCOUNTER
                                            UNION ALL
                                                SELECT 
                                                        nplus1.[COUNTER], 
                                                        nplus1.REFCOUNTER 
                                                FROM 
                                                        TBL_COUNTER as nplus1, 
                                                        n
                                                WHERE 
                                                        n.[COUNTER] = nplus1.REFCOUNTER
                                            )
                                        SELECT @orig_counter = [COUNTER] FROM n WHERE REFCOUNTER = '' OR REFCOUNTER IS NULL
                                    END
                                    BEGIN
                                        UPDATE TEMP
                                           SET 
                                               [REFCOUNTER] = @REFCOUNTER
                                              ,[PERNR] = @PERNR 
                                              ,[WORKDATE] = @WORKDATE                                               
                                              ,[CATSHOURS] = @CATSHOURS                                                                                    
                                              ,[APDAT] = @APDAT                                        
                                              ,[LAETM] = @LAETM
                                              ,[CATS_STATUS] = @CATS_STATUS
                                              ,[APPR_STATUS] = @APPR_STATUS                                        
                                            WHERE [COUNTER] = @orig_counter
                                    END
                                END
                            ELSE
                                BEGIN
                                    INSERT INTO TEMP
                                               ([COUNTER]
                                               ,[REFCOUNTER]                                               
                                               ,[PERNR]                                               
                                               ,[WORKDATE]                                               
                                               ,[CATSHOURS]                                             
                                               ,[APDAT]                                              
                                               ,[LAETM]
                                               ,[CATS_STATUS]                                               
                                               ,[APPR_STATUS])                                              
                                         VALUES
                                               (@COUNTER
                                               ,@REFCOUNTER                                              
                                               ,@PERNR                                               
                                               ,@WORKDATE                                             
                                               ,@CATSHOURS                                               
                                               ,@APDAT                                               
                                               ,@LAETM                                               
                                               ,@CATS_STATUS                                               
                                               ,@APPR_STATUS)                                               
                                END
                    END

            FETCH NEXT FROM curs INTO
                @COUNTER,
                @REFCOUNTER,
                @PERNR,
                @WORKDATE,
                @CATSHOURS,
                @APDAT,
                @LAETM,
                @CATS_STATUS,
                @APPR_STATUS
        END
    END
END

BEGIN
    CLOSE curs
    DEALLOCATE curs
END

I shortened it and created the tables for you all to be able to see what is going on. The expected result is

+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
|   COUNTER    |  REFCOUNTER  |  PERNR   | WORKDATE | CATSHOURS |  APDAT   | LAETM  | CATS_STATUS | APPR_STATUS |
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
| 000421692670 | NULL         | 00000071 | 20190114 |      6.00 | 20190204 | 174541 |          30 |          30 |
| 000421692671 | NULL         | 00000071 | 20190114 |      3.00 | 20190204 | 174541 |          30 |          30 |
| 000421692672 | 000421692672 | 00000071 | 20190115 |      0.00 | 20190115 | 111007 |          30 |          30 |
| 000421692673 | 000421692673 | 00000071 | 20190115 |      0.00 | 20190115 | 111007 |          30 |          30 |
| 000429718015 | 000430154142 | 00000072 | 20190313 |      4.50 | 20190315 | 164659 |          30 |          30 |
| 000429718016 | NULL         | 00000072 | 20190313 |      1.50 | 20190315 | 164659 |          30 |          30 |
| 000429718017 | NULL         | 00000072 | 20190313 |       1.0 | 20190315 | 164659 |          30 |          30 |
| 000430154143 | 000430154143 | 00000072 | 20190313 |      2.50 | 20190315 | 164659 |          30 |          30 |
| 000429774620 | 000429774620 | 00000152 | 20190314 |      2.00 | 00000000 | 000000 |          60 |          20 |
| 000429774619 | 000429802105 | 00000152 | 20190314 |      5.00 | 20190315 | 143857 |          30 |          30 |
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+

I will pull the data from the origin source for new records and changed records from the last time i ran it. So I will not have the full chain every week. There needs to be a way to get back to the original counter value, without the full dataset, which is why i had the counter table. I apologize for not being more clear.

I hope what I am trying to do is clear. As i stated, i have it working with a cursor, but it is so slow. Processing a week work of data is taking at least 24 hours.

Upvotes: 1

Views: 295

Answers (1)

Neville Kuyt
Neville Kuyt

Reputation: 29629

Your code is fairly long and complicated, and I'm not sure I fully understand what it does. However, I can show you the way to think about this in a more "relational database" way.

Your cursor iterates through millions of records, and for each record, executes a fairly simple if statement, and where that yields true, it does something.

In a database, you can replace that if for a single row in the cursor with a where against the entire database. As long as your where clause hits an index, it should be very fast.

For example, your first step is make sure that the current @counter is present in the table COUNTER:

IF NOT EXISTS (SELECT * FROM TBL_COUNTER WHERE [COUNTER] = @COUNTER)
BEGIN
   INSERT INTO TBL_COUNTER
   ([COUNTER]
   ,REFCOUNTER)
   VALUES
   (@COUNTER
   ,@REFCOUNTER)
END

This code executes for every row in your cursor (millions of times). Instead, outside the cursor, just once, you can run the following query:

INSERT INTO TBL_COUNTER
   ([COUNTER]
   ,REFCOUNTER)
SELECT [COUNTER]
       REFCOUNTER
FROM CATSDB
WHERE COUNTER NOT IN 
  (SELECT COUNTER FROM TBL_COUNTER)

This replaces inspecting every single row, one by one, with a where clause against the entire dataset.

You can take a similar approach with the next statement - the same principle applies.

IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = @COUNTER) BEGIN --If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record

   IF @REFCOUNTER <> '' AND @REFCOUNTER IS NOT NULL
   ....
   ELSE
   BEGIN
    INSERT INTO TEMP
      ([COUNTER]
       ,[REFCOUNTER]                                               
       ,[PERNR]                                               
       ,[WORKDATE]                                               
       ,[CATSHOURS]                                             
       ,[APDAT]                                              
       ,[LAETM]
       ,[CATS_STATUS]                                               
       ,[APPR_STATUS])                                              
       VALUES
         (@COUNTER
         ,@REFCOUNTER                                              
         ,@PERNR                                               
         ,@WORKDATE                                             
         ,@CATSHOURS                                               
         ,@APDAT                                               
         ,@LAETM                                               
         ,@CATS_STATUS                                               
         ,@APPR_STATUS)                                               
    END

You can re-write this as:

INSERT INTO TEMP
          ([COUNTER]
           ,[REFCOUNTER]                                               
           ,[PERNR]                                               
           ,[WORKDATE]                                               
           ,[CATSHOURS]                                             
           ,[APDAT]                                              
           ,[LAETM]
           ,[CATS_STATUS]                                               
           ,[APPR_STATUS])   
    SELECT 
 [COUNTER],
 REFCOUNTER,
 PERNR,
 WORKDATE,
 CATSHOURS,
 APDAT,
 LAETM,
 CATS_STATUS,
 APPR_STATUS
 FROM 
 CATSDB
  --- We want the opposite of IF @REFCOUNTER <> '' AND @REFCOUNTER IS NOT NULL
  WHERE @REFCOUNTER = '' OR @REFCOUNTER IS NULL

It may well be that this gives you enough of a speed boost (you have to adjust your cursor to exclude those cases, of course).

Finally, the clause with the CTE. It's late, I may be missing, something, but the comment says:

 --If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record

I think you can achieve that with:

 UPDATE TEMP
 SET 
 [REFCOUNTER] = c.REFCOUNTER
 ,[PERNR] = c.PERNR 
 ,[WORKDATE] = c.WORKDATE                                               
 ,[CATSHOURS] = c.CATSHOURS                                                                                    
 ,[APDAT] = c.APDAT                                        
 ,[LAETM] = c.LAETM
 ,[CATS_STATUS] = c.CATS_STATUS
 ,[APPR_STATUS] = c.APPR_STATUS                                        
 from TEMP t, 
 CATSDB c
 WHERE t.[COUNTER] = C.REFCOUNTER

I think I'm missing what the second part of your UNION in the CTE is achieving.

Upvotes: 1

Related Questions