Running large queries in the background MS SQL

Question

I am using MS SQL Server 2008 i have a table which is constantly in use (data is always changing and inserted to it) it contains now ~70 Mill rows, I am trying to run a simple query over the table with a stored procedure that should properly take a few days,

I need the table to keep being usable, now I executed the stored procedure and after a while every simple select by identity query that I try to execute on the table is not responding/running too much time that I break it

what should I do? here is how my stored procedure looks like:

 SET NOCOUNT ON;
update SOMETABLE
set
[some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
WHERE 
[some_col] = 243

even if i try it with this on the where clause (with an 'and' logic..) :

ID_COL > 57000000 and ID_COL < 60000000 and

it still doesn't work

BTW- SomeFunction does some simple mathematics actions and looks up rows in another table that contains about 300k items, but is never changed

TToni · Accepted Answer

From my perspective your server has a serious performance problem. Even if we assume that none of the records in the query

select some_col with (nolock) where id_col between 57000000 and 57001000

was in memory, it shouldn't take 21 seconds to read the few pages sequentially from disk (your clustered index on the id_col should not be fragmented if it's an auto-identity and you didn't do something stupid like adding a "desc" to the index definition).

But if you can't/won't fix that, my advice would be to make the update in small packages like 100-1000 records at a time (depending on how much time the lookup function consumes). One update/transaction should take no more than 30 seconds.

You see each update keeps an exclusive lock on all the records it modified until the transaction is complete. If you don't use an explicit transaction, each statement is executed in a single, automatic transaction context, so the locks get released when the update statement is done.

But you can still run into deadlocks that way, depending on what the other processes do. If they modify more than one record at a time, too, or even if they gather and hold read locks on several rows, you can get deadlocks.

To avoid the deadlocks, your update statement needs to take a lock on all the records it will modify at once. The way to do this is to place the single update statement (with only the few rows limited by the id_col) in a serializable transaction like

IF @@TRANCOUNT > 0
  -- Error: You are in a transaction context already

SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE

-- Insert Loop here to work "x" through the id range
  BEGIN TRANSACTION
    UPDATE SOMETABLE
      SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
      WHERE [some_col] = 243 AND id_col BETWEEN x AND x+500 -- or whatever keeps the update in the small timerange
  COMMIT
-- Next loop

-- Get all new records while you where running the loop. If these are too many you may have to paginate this also:
BEGIN TRANSACTION
  UPDATE SOMETABLE
    SET [some_col] = dbo.ufn_SomeFunction(CONVERT(NVARCHAR(500), another_column))
    WHERE [some_col] = 243 AND id_col >= x
COMMIT

For each update this will take an update/exclusive key-range lock on the given records (but only them, because you limit the update through the clustered index key). It will wait for any other updates on the same records to finish, then get it's lock (causing blocking for all other transactions, but still only for the given records), then update the records and release the lock.

The last extra statement is important, because it will take a key range lock up to "infinity" and thus prevent even inserts on the end of the range while the update statement runs.

Running large queries in the background MS SQL

Answers (1)

Related Questions