Kamyar
Kamyar

Reputation: 18797

using triggers to update Values

I'm trying to enhance the performance of a SQL Server 2000 job. Here's the scenario:
Table A has max. of 300,000 rows. If I update/delete the 100th row (Based on the insertion time) all the rows which has been added after that row, should update their values. Row no. 101, should update its value based on row no. 100 and row no. 102 should update its value based on the row no.101's updated value. e.g.

Old Table:

ID...........Value  
100..........220  
101..........(220/2) = 110  
102..........(110/2)=55  
......................

Row No. 100 updated with new value: 300.

New Table

ID...........Value  
100..........300  
101..........(300/2) = 150  
102..........(150/2)=75  
......................  

The actual values calculation is more complex. the formula is for simplicity.

Right now, a trigger is defined for update/delete statements. When a row is updated or deleted, the trigger adds the row's data to a log table. Also, a SQL Job is created in code-behind after update/delete which fires a stored procedure that finally, iterates through all the next rows of table A and updates their values. The process takes ~10 days to be accomplished for 300,000 rows.

When the SP gets fired, it updates the next rows' values. I think this causes the trigger to run again for each SP update and add these rows to the log table too. Also, The task should be done in DB-side as requested by customer.

To solve the problem:
Modify the stored procedure and call it directly from the trigger. The stored procedure then drops the trigger and updates the next rows' values and then creates the trigger again.

Thank you.

Upvotes: 0

Views: 1539

Answers (2)

Andriy M
Andriy M

Reputation: 77707

First, about the update process. I understand, your procedure is simply calling itself, when it comes to updating the next row. With 300K rows this is certainly not going to be very fast, even without logging (though it would most probably take much fewer days to accomplish). But what is absolutely beyond me is how it is possible to update more than 32 rows that way without reaching the maximum nesting level. Maybe I've got the sequence of actions wrong.

Anyway, I would probably do that differently, with just one instruction:

UPDATE yourtable
SET @value = Value = CASE ID
                       WHEN @id THEN @value
                       ELSE @value / 2 /* basically, your formula */
                     END
WHERE ID >= @id
OPTION (MAXDOP 1);

The OPTION (MAXDOP 1) bit of the statement limits the degree of parallelism for the statement to 1, thus making sure the rows are updated sequentially and every value is based on the previous one, i.e. on the value from the row with the preceding ID value. Also, the ID column should be made a clustered index, which it typically is by default, when it's made the primary key.

The other functionality of the update procedure, i.e. dropping and recreating the trigger, should probably be replaced by disabling and re-enabling it:

ALTER TABLE yourtable DISABLE TRIGGER yourtabletrigger

/* the update part */

ALTER TABLE yourtable ENABLE TRIGGER yourtabletrigger

But then, you are saying the trigger shouldn't actually be dropped/disabled, because several users might update the table at the same time.

All right then, we are not touching the trigger.

Instead I would suggest adding a special column to the table, the one the users shouldn't be aware of, or at least shouldn't care much of and should somehow be made sure never to touch. That column should only be updated by your 'cascading update' process. By checking whether that column was being updated or not you would know whether you should call the update procedure and the logging.

So, in your trigger there could be something like this:

IF NOT UPDATE(SpecialColumn) BEGIN
  /* assuming that without SpecialColumn only one row can be updated */
  SELECT TOP 1 @id = ID, @value = Value FROM inserted;
  EXEC UpdateProc @id, @value;
  EXEC LogProc ...;
END

In UpdateProc:

UPDATE yourtable
SET @value = Value = @value / 2,
    SpecialColumn = SpecialColumn /* basically, just anything, since it can
                                     only be updated by this procedure */
WHERE ID > @id
OPTION (MAXDOP 1);

You may have noticed that the UPDATE statement is slightly different this time. I understand, your trigger is FOR UPDATE (= AFTER UPDATE), which means that the @id row is already going to be updated by the user. So the procedure should skip it and start from the very next row, and the update expression can now be just the formula.

In conclusion I'd like to say that my test update involved 299,995 of my table's 300,000 rows and took approximately 3 seconds on my not so very fast system. No logging, of course, but I think that should give your the basic picture of how fast it can be.

Upvotes: 1

Ken Downs
Ken Downs

Reputation: 4827

Big theoretical problem here. It is always extremely suspicious when updating one row REQUIRES updating 299,900 other rows. It suggests a deep flaw in the data model. Not that it is never appropriate, just that it is required far far less often than people think. When things like this are absolutely necessary, they are usually done as a batch operation.

The best you can hope for, in some miraculous situation, is to turn that 10 days into 10 minutes, but never even 10 seconds. I would suggest explaining thoroughly WHY this seems necessary, so that another approach can be explored.

Upvotes: 1

Related Questions