SSIS incremental load

Question

I'm attempting to transfer approximately 1 billion rows with an SSIS package over an unstable line. As such, it keeps failing part way through. I'd like some way to make it restartable. I attempted to put a lookup transform between the source and destination, but that makes it way too slow. Is there another way to do what I'm trying to do without taking such a performance hit?

billinkc · Accepted Answer

My initial approach would be to write a package that when it starts, it identifies a subset of the data to be transfered, records what subset it is working on and attempts to transfer that data. If it completes, then it marks that data as having been transferred and exits. Otherwise, well it's already blown up and there's nothing for that package to do.

Another process would run X timeframe and attempt to find failed transfers (subsets marked as in process but older than Y duration). It'd then delete those rows from the transfer table or mark them as eligible for transfer. The general idea being that stuff that broke is flagged for a do-over.

It's a pretty simple design, the hardest part would be segmenting and keeping track of what has/hasn't been transferred and then simply set up a SQL Agent job to fire the package every N timeframes. If it wasn't your network that was faulty, a nice thing about doing data transfer like this is that it allows you to parallelize your executions so that you get maximum throughput. The SSIS team took a similar approach when they set the record for loading data. They also goosed the hell out of the system but that's too be expected.

If the 1B row system has lots of updates going on, then your data segmenting logic will need some mechanism to identify change and ensure those records are fed back into the system but since you didn't specify that as a need, I'll ignore it for now.

If you are trying to use a lookup, ensure you are only pulling back what is absolutely essential. For this case, I would hope it's a very narrow key. If you're taking the segmented data load approach, then ensure you use the same partitioning logic in your lookup transformation. Otherwise, you'll pull back (1B -1 *transferSize) rows of data on your final run which will undoubtedly play havoc on your shoddy network.

Lots of generalities but sing out if you want more details on a facet.

SSIS incremental load

Answers (2)

Related Questions