user867198
user867198

Reputation: 1648

Select and Delete in the same transaction using TOP clause

I have table in which the data is been continuously added at a rapid pace.

And i need to fetch record from this table and immediately remove them so i cannot process the same record second time. And since the data is been added at a faster rate, i need to use the TOP clause so only small number of records go to business logic for processing at the time.

I am using the below query to

BEGIN TRAN readrowdata
SELECT    
 top 5 [RawDataId],    
 [RawData]    
FROM    
 [TABLE]  with(HOLDLOCK)  

 WITH  q AS
    (
        SELECT    
        top 5 [RawDataId],    
        [RawData]    
        FROM    
        [TABLE]  with(HOLDLOCK)  
    )

DELETE from q  
COMMIT TRANSACTION readrowdata  

I am using the HOLDLOCK here, so new data cannot insert into the table while i am performing the SELECT and DELETE operation. I used it because Suppose if there are only 3 records in the table now, so the SELECT statement will get 3 records and in the same time new record gets inserted and the DELETE statement will delete 4 records. So i will loose 1 data here.

Is the query is ok in performance term? If i can improve it then please provide me your suggestion.

Thank you

Upvotes: 3

Views: 8383

Answers (5)

Nikita Danilov
Nikita Danilov

Reputation: 108

I know that it's old question, but I found some solution here https://www.simple-talk.com/sql/learn-sql-server/the-delete-statement-in-sql-server/:

DECLARE @Output table
(
  StaffID INT,
  FirstName NVARCHAR(50),
  LastName NVARCHAR(50),
  CountryRegion NVARCHAR(50)
);
DELETE SalesStaff
OUTPUT DELETED.* INTO @Output
FROM Sales.vSalesPerson sp
  INNER JOIN dbo.SalesStaff ss
  ON sp.BusinessEntityID = ss.StaffID
WHERE sp.SalesLastYear = 0;
SELECT * FROM @output;

Maybe it will be helpfull for you.

Upvotes: 0

Diego
Diego

Reputation: 36156

If I understand you correctly, you are worried that between your select and your delete, more records would be inserted and the first TOP 5 would be different then the second TOP 5?

If that so, why don't you load your first select into a temp table or variable (or at least the PKs) do whatever you have to do with your data and then do your delete based on this table?

Upvotes: 0

Pankaj
Pankaj

Reputation: 10105

You can do it very easily with TRIGGERS. Below mentioned is a kind of situation which will help you need not to hold other users which are trying to insert data simultaneously. Like below...

Data Definition language

CREATE TABLE SampleTable
(
    id int
)

Sample Record

insert into SampleTable(id)Values(1)

Sample Trigger

CREATE TRIGGER SampleTableTrigger
on SampleTable AFTER INSERT
AS
IF Exists(SELECT id FROM INSERTED)
BEGIN
    Set NOCOUNT ON
    SET XACT_ABORT ON
    Begin Try
        Begin Tran
                 Select ID From Inserted
                 DELETE From yourTable WHERE ID IN (SELECT id FROM Inserted);

        Commit Tran
    End Try
    Begin Catch
        Rollback Tran
    End Catch


End

Hope this is very simple and helpful

Upvotes: 0

MatBailie
MatBailie

Reputation: 86765

Personally, I'd use a different approach. One with less locking, but also extra information signifying that certain records are currently being processed...

DECLARE @rowsBeingProcessed TABLE (
    id INT
);

WITH rows AS (
  SELECT top 5 [RawDataId] FROM yourTable WHERE processing_start IS NULL
)
UPDATE rows SET processing_start = getDate() WHERE processing_start IS NULL
OUTPUT INSERTED.RowDataID INTO @rowsBeingProcessed;

-- Business Logic Here

DELETE yourTable WHERE RowDataID IN (SELECT id FROM @rowsBeingProcessed);

Then you can also add checks like "if a record has been 'beingProcessed' for more than 10 minutes, assume that the business logic failed", etc, etc.

Upvotes: 3

Neville Kuyt
Neville Kuyt

Reputation: 29639

By locking the table in this way, you force other processes to wait for your transaction to complete. This can have very rapid consequences on scalability and performance - and it tends to be hard to predict, because there's often a chain of components all relying on your database.

If you have multiple clients each running this query, and multiple clients adding new rows to the table, the overall system performance is likely to deteriorate at some times, as each "read" client is waiting for a lock, the number of "write" clients waiting to insert data grows, and they in turn may tie up other components (whatever is generating the data you want to insert).

Diego's answer is on the money - put the data into a variable, and delete matching rows. Don't use locks in SQL Server if you can possibly avoid it!

Upvotes: 1

Related Questions