andr111
andr111

Reputation: 3052

T-SQL long running query inside a view

I have a view that has quite a complex query (see below). I then use this view to insert data into another table. I have lots of data in the database and when I run the insert query it runs for 8 hours and does the actual insert only in the end. It seems that it first fetches all results from the view and only then inserts them into my table. Is it possible to make it insert each record separately?

Here is the view:

CREATE VIEW [dbo].[Enrollment]
AS
WITH CTE AS (SELECT     RN = ROW_NUMBER() OVER (PARTITION BY PRIMARYPROVIDERCODE, CLIENTNUMBER
                               ORDER BY PRIMARYPROVIDERCODE, CLIENTNUMBER, STARTINGDATE), ID, PRIMARYPROVIDERCODE, CLIENTNUMBER, STARTINGDATE, ENDINGDATE
FROM         AUTHORIZE
WHERE     DOCREVNO = 0 AND CMT = 'N'
GROUP BY PRIMARYPROVIDERCODE, CLIENTNUMBER, STARTINGDATE, ENDINGDATE, ID)
    SELECT     [Current Row].ID, [Current Row].RN, [Current Row].PRIMARYPROVIDERCODE, [Current Row].CLIENTNUMBER, [Current Row].STARTINGDATE, 
                            ENDINGDATE =
                                (SELECT     TOP 1 [Next Ending].ENDINGDATE
                                  FROM          CTE[Next Ending]
                                  WHERE      [Next Ending].RN >= [Current Row].RN AND [Next Ending].ENDINGDATE IS NOT NULL AND 
                                                         [Next Ending].PRIMARYPROVIDERCODE = [Current Row].PRIMARYPROVIDERCODE AND 
                                                         [Next Ending].CLIENTNUMBER = [Current Row].CLIENTNUMBER
                                  ORDER BY [Next Ending].RN)
     FROM         CTE[Current Row] INNER JOIN
                            CTE[Previous Row] ON ([Previous Row].PRIMARYPROVIDERCODE = [Current Row].PRIMARYPROVIDERCODE AND 
                            [Previous Row].CLIENTNUMBER = [Current Row].CLIENTNUMBER) AND (([Previous Row].RN = [Current Row].RN AND [Current Row].RN = 1) OR
                            ([Previous Row].RN = [Current Row].RN - 1 AND [Previous Row].ENDINGDATE IS NOT NULL))

Here is the INSERT query:

INSERT INTO [dbo].[clientenrollment]
           ([DOCSERNO]
           ,[DOCREVNO]
           ,[USERID]
           ,[SIGNED]
           ,[TIMESTAMP]
           ,[ClientNumber]
           ,[enrollmentdate]
           ,[terminationdate]
           ,[PrimaryProviderCode]
           ,[Action]
           ,[Providername]
           ,[SERVICEMAPCODE])
SELECT REPLACE(STR(6620100322000000 + row_number() over (order by ev.ID asc), 17, 0), ' ', '0')
           ,0
           , NULL
           , 0
           , GETDATE()
           ,ev.CLIENTNUMBER
           ,ev.STARTINGDATE
           ,ev.ENDINGDATE
           ,ev.PRIMARYPROVIDERCODE
           ,auth.ACTION
           ,p.PROVIDERNAME
           ,ms.MapCode  
FROM [dbo].[Enrollment] AS ev
JOIN AUTHORIZE auth ON auth.ID = ev.ID
LEFT JOIN MasterService ms ON ms.Code = auth.SERVICECODE
LEFT JOIN PROVIDER p ON p.PROVIDERCODE = ev.PRIMARYPROVIDERCODE
WHERE p.DOCREVNO = 0

Upvotes: 0

Views: 382

Answers (4)

HLGEM
HLGEM

Reputation: 96600

Part of your problem is the use of a view is a poor idea here. You have to generate the whole view and then you join to a table in the view and filter down. First thing I would do is ditch the view . Then I would ditch the correlated subquery as those have to run row by row and you woudl get better performacen form a derived table in a join. I don't have time to go through this in detail and figure out exactly what you are trying to accomplish, but I will point out that when you have to try so hard to accomplish somthing you usually have a design problem in your database structure.

Upvotes: 1

Chris Shaffer
Chris Shaffer

Reputation: 32575

I'll second Cade Roux in saying that it seems like 8 hours is longer than I would expect an optimized query to run and you certainly should check the indexes on the table.

I'll also add that perhaps simplifying the JOIN clause and the subquery WHERE/implicit JOIN clause might be worth something. You could do this by using the RANK() function to group the similar rows and then join on those group numbers instead of the multiple columns you are currently using.

Not guaranteed to be perfect, but something like this:

CREATE VIEW [dbo].[Enrollment]
AS
WITH CTE AS (SELECT     GN = RANK() OVER (ORDER BY PRIMARYPROVIDERCODE, CLIENTNUMBER) , RN = ROW_NUMBER() OVER (PARTITION BY PRIMARYPROVIDERCODE, CLIENTNUMBER
                               ORDER BY PRIMARYPROVIDERCODE, CLIENTNUMBER, STARTINGDATE), ID, PRIMARYPROVIDERCODE, CLIENTNUMBER, STARTINGDATE, ENDINGDATE
FROM         AUTHORIZE
WHERE     DOCREVNO = 0 AND CMT = 'N'
GROUP BY PRIMARYPROVIDERCODE, CLIENTNUMBER, STARTINGDATE, ENDINGDATE, ID)
    SELECT     [Current Row].ID, [Current Row].RN, [Current Row].PRIMARYPROVIDERCODE, [Current Row].CLIENTNUMBER, [Current Row].STARTINGDATE, 
                            ENDINGDATE =
                                (SELECT     TOP 1 [Next Ending].ENDINGDATE
                                  FROM          CTE[Next Ending]
                                  WHERE      [Next Ending].RN >= [Current Row].RN AND [Next Ending].ENDINGDATE IS NOT NULL AND 
                                                         [Next Ending].GN = [Current Row].GN
                                  ORDER BY [Next Ending].RN)
     FROM         CTE[Current Row] INNER JOIN
                            CTE[Previous Row] ON ([Previous Row].GN = [Current Row].GN) AND (([Previous Row].RN = [Current Row].RN AND [Current Row].RN = 1) OR
                            ([Previous Row].RN = [Current Row].RN - 1 AND [Previous Row].ENDINGDATE IS NOT NULL))

In similar situations I've had hour+ queries turn into minute+ queries with this approach, but of course every situation is different...

Upvotes: 0

Cade Roux
Cade Roux

Reputation: 89721

Is it possible to make it insert each record separately

Yes and no.

It is ALREADY inserting each record separately, but it only commits at the very end. Sometimes on a very long insert if you use dirty reads, you can see row counts increasing on a single very long insert. No other transaction normally sees the records until they are committed, because that's an ACID thing, and the whole transaction could be rolled back.

As far as how it determines the rows to insert depends upon the execution plan. Because you have some sorting (ORDER BY) and some common table-expressions, determining the rows may take a while and it certainly is possible that the majority of the operation is determining the rows to be inserted (perhaps spooling to temp storage during this operation) and then they are very rapidly inserted.

It could be possible to insert single rows in individual transactions by breaking your operation up in some way. Of course if the operation is halted, you may have only part of the entire operation completed.

Alternatively, I think 8 hours is far too long for a query like this to run, and given you haven't mentioned anything about indexing strategies or the number of rows in any of your tables or the execution plan, I would first look at those things and learn about them.

Do you have an index on PRIMARYPROVIDERCODE, CLIENTNUMBER, STARTINGDATE? Also note that the WHERE clause: DOCREVNO = 0 AND CMT = 'N' might need to be first in that index as well. Look at the execution plan to see what's happening. Even if you don't have access to production, the production DBA should be able to give you the execution plan.

Upvotes: 2

Related Questions