Reputation: 331
This is sort of a follow up question I asked here: SQL Server better way to iterate through millions of rows. I was told in that thread to submit a new question because what I really was looking for was different than the way I asked it.
I was not initially clear in my OP that this procedure will be run periodically. So this will not be run one time, it will be run weekly and the procedure needs to be able to check the counter table and the temp table.
I am working with SAP Timesheet data, so there are millions of rows. What I am trying to do is select the data from the SAP table and insert it into a table on MS SQL Server.
So I want to insert the original record(if it doesnt already exist), then if an update to the original record happens, which is in the form of a new SAP record with a refcounter
, I want to find the original record in my table and update it, keeping the original counter value.
So I have done this successfully with a cursor (I know not the best), but with millions of records, I am wondering if there is a faster way, because I am on day 4 of my cursor running. Is there a better way then what I have below:
BEGIN
CREATE TABLE CATSDB
(
[COUNTER] nvarchar(12),
REFCOUNTER nvarchar(12),
PERNR nvarchar(8),
WORKDATE nvarchar(8),
CATSHOURS decimal(7, 3),
APDAT nvarchar(8),
LAETM nvarchar(6),
CATS_STATUS nvarchar(2),
APPR_STATUS nvarchar(2)
)
INSERT INTO CATSDB
(
[COUNTER],REFCOUNTER,PERNR,WORKDATE,CATSHOURS,APDAT,LAETM,CATS_STATUS,APPR_STATUS
)
VALUES
('000421692670',NULL,'00000071','20190114','6.00','20190204','174541','30','30'),
('000421692671',NULL,'00000071','20190114','3.00','20190204','174541','30','30'),
('000421692672',NULL,'00000071','20190115','6.00','00000000','000000','60','20'),
('000421692673',NULL,'00000071','20190115','3.00','00000000','000000','60','20'),
('000421692712','000421692672','00000071','20190115','0.00','20190115','111007','30','30'),
('000421692713','000421692673','00000071','20190115','0.00','20190115','111007','30','30'),
('000429718015',NULL,'00000072','20190313','7.00','00000000','000000','60','20'),
('000429718016',NULL,'00000072','20190313','1.50','20190315','164659','30','30'),
('000429718017',NULL,'00000072','20190313','1.00','20190315','164659','30','30'),
('000430154143',NULL,'00000072','20190313','2.00','00000000','000000','60','20'),
('000430154142','000429718015','00000072','20190313','5.00','00000000','000000','60','20'),
('000430154928','000430154142','00000072','20190313','4.50','20190315','164659','30','30'),
('000430154929','000430154143','00000072','20190313','2.50','20190315','164659','30','30'),
('000429774620',NULL,'00000152','20190314','1.00','00000000','000000','60','20'),
('000429774619',NULL,'00000152','20190314','1.00','00000000','000000','60','20'),
('000429802106','000429774620','00000152','20190314','2.00','00000000','000000','60','20'),
('000429802105','000429774619','00000152','20190314','3.00','00000000','000000','60','20'),
('000429840242','000429802106','00000152','20190314','4.00','20190315','143857','30','30'),
('000429840241','000429802105','00000152','20190314','5.00','20190315','143857','30','30')
CREATE TABLE [TBL_COUNTER]
(
[COUNTER] [varchar](12) NOT NULL,
[REFCOUNTER] [varchar](12) NULL
)
CREATE TABLE TEMP
(
[COUNTER] [nvarchar](12) NOT NULL,
[REFCOUNTER] [nvarchar](12) NULL,
[PERNR] [nvarchar](8) NULL,
[WORKDATE] [nvarchar](8) NULL,
[CATSHOURS] [decimal](7, 3) NULL,
[APDAT] [nvarchar](8) NULL,
[LAETM] [nvarchar](6) NULL,
[CATS_STATUS] [nvarchar](2) NULL,
[APPR_STATUS] [nvarchar](2) NULL
)
END
BEGIN
DECLARE @COUNTER nvarchar(12),
@REFCOUNTER nvarchar(12),
@PERNR nvarchar(8),
@WORKDATE nvarchar(8),
@CATSHOURS decimal(7, 3),
@APDAT nvarchar(8),
@LAETM nvarchar(6),
@CATS_STATUS nvarchar(2),
@APPR_STATUS nvarchar(2)
DECLARE @orig_counter nvarchar(12)
END
--The below cursor will include a where statement where it selects data
--where the workdate or apdat fall within a data range. Since I am
--running this weekly, the to and from dates will be the last work week.
BEGIN
DECLARE curs CURSOR FOR
SELECT
[COUNTER],
REFCOUNTER,
PERNR,
WORKDATE,
CATSHOURS,
APDAT,
LAETM,
CATS_STATUS,
APPR_STATUS
FROM
CATSDB
END
BEGIN
OPEN curs
END
BEGIN
FETCH NEXT FROM curs INTO
@COUNTER,
@REFCOUNTER,
@PERNR,
@WORKDATE,
@CATSHOURS,
@APDAT,
@LAETM,
@CATS_STATUS,
@APPR_STATUS
END
BEGIN
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN
IF NOT EXISTS (SELECT * FROM TBL_COUNTER WHERE [COUNTER] = @COUNTER)
BEGIN
INSERT INTO TBL_COUNTER
([COUNTER]
,REFCOUNTER)
VALUES
(@COUNTER
,@REFCOUNTER)
END
END
BEGIN
IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = @COUNTER)
BEGIN
--If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record
IF @REFCOUNTER <> '' AND @REFCOUNTER IS NOT NULL
BEGIN
BEGIN
WITH n([COUNTER], REFCOUNTER) AS
(
SELECT
cnt.[COUNTER],
cnt.REFCOUNTER
FROM
TBL_COUNTER cnt
WHERE
cnt.[COUNTER] = @REFCOUNTER
UNION ALL
SELECT
nplus1.[COUNTER],
nplus1.REFCOUNTER
FROM
TBL_COUNTER as nplus1,
n
WHERE
n.[COUNTER] = nplus1.REFCOUNTER
)
SELECT @orig_counter = [COUNTER] FROM n WHERE REFCOUNTER = '' OR REFCOUNTER IS NULL
END
BEGIN
UPDATE TEMP
SET
[REFCOUNTER] = @REFCOUNTER
,[PERNR] = @PERNR
,[WORKDATE] = @WORKDATE
,[CATSHOURS] = @CATSHOURS
,[APDAT] = @APDAT
,[LAETM] = @LAETM
,[CATS_STATUS] = @CATS_STATUS
,[APPR_STATUS] = @APPR_STATUS
WHERE [COUNTER] = @orig_counter
END
END
ELSE
BEGIN
INSERT INTO TEMP
([COUNTER]
,[REFCOUNTER]
,[PERNR]
,[WORKDATE]
,[CATSHOURS]
,[APDAT]
,[LAETM]
,[CATS_STATUS]
,[APPR_STATUS])
VALUES
(@COUNTER
,@REFCOUNTER
,@PERNR
,@WORKDATE
,@CATSHOURS
,@APDAT
,@LAETM
,@CATS_STATUS
,@APPR_STATUS)
END
END
FETCH NEXT FROM curs INTO
@COUNTER,
@REFCOUNTER,
@PERNR,
@WORKDATE,
@CATSHOURS,
@APDAT,
@LAETM,
@CATS_STATUS,
@APPR_STATUS
END
END
END
BEGIN
CLOSE curs
DEALLOCATE curs
END
I shortened it and created the tables for you all to be able to see what is going on. The expected result is
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
| COUNTER | REFCOUNTER | PERNR | WORKDATE | CATSHOURS | APDAT | LAETM | CATS_STATUS | APPR_STATUS |
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
| 000421692670 | NULL | 00000071 | 20190114 | 6.00 | 20190204 | 174541 | 30 | 30 |
| 000421692671 | NULL | 00000071 | 20190114 | 3.00 | 20190204 | 174541 | 30 | 30 |
| 000421692672 | 000421692672 | 00000071 | 20190115 | 0.00 | 20190115 | 111007 | 30 | 30 |
| 000421692673 | 000421692673 | 00000071 | 20190115 | 0.00 | 20190115 | 111007 | 30 | 30 |
| 000429718015 | 000430154142 | 00000072 | 20190313 | 4.50 | 20190315 | 164659 | 30 | 30 |
| 000429718016 | NULL | 00000072 | 20190313 | 1.50 | 20190315 | 164659 | 30 | 30 |
| 000429718017 | NULL | 00000072 | 20190313 | 1.0 | 20190315 | 164659 | 30 | 30 |
| 000430154143 | 000430154143 | 00000072 | 20190313 | 2.50 | 20190315 | 164659 | 30 | 30 |
| 000429774620 | 000429774620 | 00000152 | 20190314 | 2.00 | 00000000 | 000000 | 60 | 20 |
| 000429774619 | 000429802105 | 00000152 | 20190314 | 5.00 | 20190315 | 143857 | 30 | 30 |
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
I will pull the data from the origin source for new records and changed records from the last time i ran it. So I will not have the full chain every week. There needs to be a way to get back to the original counter value, without the full dataset, which is why i had the counter table. I apologize for not being more clear.
I hope what I am trying to do is clear. As i stated, i have it working with a cursor, but it is so slow. Processing a week work of data is taking at least 24 hours.
Upvotes: 1
Views: 295
Reputation: 29629
Your code is fairly long and complicated, and I'm not sure I fully understand what it does. However, I can show you the way to think about this in a more "relational database" way.
Your cursor iterates through millions of records, and for each record, executes a fairly simple if
statement, and where that yields true, it does something.
In a database, you can replace that if
for a single row in the cursor with a where
against the entire database. As long as your where
clause hits an index, it should be very fast.
For example, your first step is make sure that the current @counter is present in the table COUNTER
:
IF NOT EXISTS (SELECT * FROM TBL_COUNTER WHERE [COUNTER] = @COUNTER)
BEGIN
INSERT INTO TBL_COUNTER
([COUNTER]
,REFCOUNTER)
VALUES
(@COUNTER
,@REFCOUNTER)
END
This code executes for every row in your cursor (millions of times). Instead, outside the cursor, just once, you can run the following query:
INSERT INTO TBL_COUNTER
([COUNTER]
,REFCOUNTER)
SELECT [COUNTER]
REFCOUNTER
FROM CATSDB
WHERE COUNTER NOT IN
(SELECT COUNTER FROM TBL_COUNTER)
This replaces inspecting every single row, one by one, with a where
clause against the entire dataset.
You can take a similar approach with the next statement - the same principle applies.
IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = @COUNTER) BEGIN --If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record
IF @REFCOUNTER <> '' AND @REFCOUNTER IS NOT NULL
....
ELSE
BEGIN
INSERT INTO TEMP
([COUNTER]
,[REFCOUNTER]
,[PERNR]
,[WORKDATE]
,[CATSHOURS]
,[APDAT]
,[LAETM]
,[CATS_STATUS]
,[APPR_STATUS])
VALUES
(@COUNTER
,@REFCOUNTER
,@PERNR
,@WORKDATE
,@CATSHOURS
,@APDAT
,@LAETM
,@CATS_STATUS
,@APPR_STATUS)
END
You can re-write this as:
INSERT INTO TEMP
([COUNTER]
,[REFCOUNTER]
,[PERNR]
,[WORKDATE]
,[CATSHOURS]
,[APDAT]
,[LAETM]
,[CATS_STATUS]
,[APPR_STATUS])
SELECT
[COUNTER],
REFCOUNTER,
PERNR,
WORKDATE,
CATSHOURS,
APDAT,
LAETM,
CATS_STATUS,
APPR_STATUS
FROM
CATSDB
--- We want the opposite of IF @REFCOUNTER <> '' AND @REFCOUNTER IS NOT NULL
WHERE @REFCOUNTER = '' OR @REFCOUNTER IS NULL
It may well be that this gives you enough of a speed boost (you have to adjust your cursor to exclude those cases, of course).
Finally, the clause with the CTE. It's late, I may be missing, something, but the comment says:
--If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record
I think you can achieve that with:
UPDATE TEMP
SET
[REFCOUNTER] = c.REFCOUNTER
,[PERNR] = c.PERNR
,[WORKDATE] = c.WORKDATE
,[CATSHOURS] = c.CATSHOURS
,[APDAT] = c.APDAT
,[LAETM] = c.LAETM
,[CATS_STATUS] = c.CATS_STATUS
,[APPR_STATUS] = c.APPR_STATUS
from TEMP t,
CATSDB c
WHERE t.[COUNTER] = C.REFCOUNTER
I think I'm missing what the second part of your UNION in the CTE is achieving.
Upvotes: 1