Dave C
Dave C

Reputation: 7402

SSIS Race Condition

We utilize SSIS (SQL/BIDS 2008) to generate close to 1000 reports per day. The same package is scheduled 5 times in SQL Agent, on staggered schedules.

The SSIS queries a Report table, and if the report hasn't run yet on the given date (check ReportLog) and that report is not running currently (check WorkQueue) the first (top) report is picked up, and ran.

That query is effectively:

SELECT TOP 1 R.ReportID
FROM Report R
LEFT JOIN ReportLog L ON R.ReportID=L.ReportID AND L.RunDate >= CONVERT(DATE,GETDATE())
LEFT JOIN WorkQueue Q ON R.ReportID=Q.ReportID
WHERE R.Active=1
AND L.ReportID IS NULL 
AND Q.ReportID IS NULL

The SSIS package, upon selecting this TOP 1 ReportID (stored in a SSIS variable), inserts the ReportID into WorkQueue, so no other instances of the package will try to use that ReportID. The insert happens in the next step after the select.

Most times, most days, this works fine. However, every so often two SSIS packages end up running at (within a few milliseconds) the exact same time, and both return the same TOP 1 ReportID causing issues with one another as they both execute the same procedures, and manipulate the same underlying data.

Right now, we're considering implementing a parent SSIS package which does the select and places the ReportID into a "buffer" table along with the Job# of one of the 5 child packages to pick up (rather than the child packages all doing their own select top 1), but that seems very hacky.

Another option we considered was an instead of insert trigger on the WorkQueue table which would raise an error on duplicate insertions.

I am wondering if there is any other way within SSIS to prevent this situation without too much redesign.

Upvotes: 1

Views: 186

Answers (1)

billinkc
billinkc

Reputation: 61249

How about avoid the trigger and make use of the OUTPUT clause?

INSERT INTO
    WorkQueue
OUTPUT Inserted.ReportID
SELECT TOP 1 R.ReportID
FROM Report R
    LEFT JOIN ReportLog L ON R.ReportID=L.ReportID AND L.RunDate >= CONVERT(DATE,GETDATE())
    LEFT JOIN WorkQueue Q ON R.ReportID=Q.ReportID
WHERE R.Active=1
    AND L.ReportID IS NULL 
    AND Q.ReportID IS NULL;

This should allow you to pop row row into the workQueue table to claim it for the current process and simultaneously retrieve the value for SSIS to use.

Upvotes: 2

Related Questions