SSIS package design, where 3rd party data is replacing existing data

Question

I have created many SSIS packages in the past, though the need for this one is a bit different than the others which I have written.

Here's the quick description of the business need:

We have a small database on our end sourced from a 3rd party vendor, and this needs to be overwritten nightly.
The source of this data is a bunch of flat files (CSV) from the 3rd party vendor.

Current setup: we truncate the tables of this database, and we then insert the new data from the files, all via SSIS.

Problem: There are times when the files fail to come, and what happens is that we truncate the old data, though we don't have the fresh data set. This leaves us without a database where we would prefer to have yesterday's data over no data at all.

Desired Solution: I would like some sort of mechanism to see if the new data truly exists (these files) prior to truncating our current data.

What I have tried: I tried to capture the data from the files and add them to an ADO recordset and only proceeding if this part was successful. This doesn't seem to work for me, as I have all the data capture activities in one data flow and I don't see a way for me to reuse that data. It would seem wasteful of resources for me to do that and let the in-memory tables just sit there.

What have you done in a similar situation?

Eli · Accepted Answer

I looked around and found that some others were struggling with the same issue, though none of them had a very elegant solution, nor do I.

What I ended up doing was to create a flat file connection to each file of interest and have a task count records and save to a variable. If a file isn't there, the package fails and you can stop execution at that point. There are some of these files whose actual count is interesting to me, though for the most part, I don't care. If you don't care what the counts are, you can keep recycling the same variable; this will reduce the creation of variables on your end (I needed 31). In order to preserve resources (read: reduce package execution time), I excluded all but one of the columns in each data source; it made a tremendous difference.

SSIS package design, where 3rd party data is replacing existing data

Answers (2)

Related Questions