Eli
Eli

Reputation: 2608

SSIS package design, where 3rd party data is replacing existing data

I have created many SSIS packages in the past, though the need for this one is a bit different than the others which I have written.

Here's the quick description of the business need:

We have a small database on our end sourced from a 3rd party vendor, and this needs to be overwritten nightly.
The source of this data is a bunch of flat files (CSV) from the 3rd party vendor.

Current setup: we truncate the tables of this database, and we then insert the new data from the files, all via SSIS.

Problem: There are times when the files fail to come, and what happens is that we truncate the old data, though we don't have the fresh data set. This leaves us without a database where we would prefer to have yesterday's data over no data at all.

Desired Solution: I would like some sort of mechanism to see if the new data truly exists (these files) prior to truncating our current data.

What I have tried: I tried to capture the data from the files and add them to an ADO recordset and only proceeding if this part was successful. This doesn't seem to work for me, as I have all the data capture activities in one data flow and I don't see a way for me to reuse that data. It would seem wasteful of resources for me to do that and let the in-memory tables just sit there.

What have you done in a similar situation?

Upvotes: 1

Views: 89

Answers (2)

Eli
Eli

Reputation: 2608

I looked around and found that some others were struggling with the same issue, though none of them had a very elegant solution, nor do I.

What I ended up doing was to create a flat file connection to each file of interest and have a task count records and save to a variable. If a file isn't there, the package fails and you can stop execution at that point. There are some of these files whose actual count is interesting to me, though for the most part, I don't care. If you don't care what the counts are, you can keep recycling the same variable; this will reduce the creation of variables on your end (I needed 31). In order to preserve resources (read: reduce package execution time), I excluded all but one of the columns in each data source; it made a tremendous difference.

Upvotes: 0

observer
observer

Reputation: 316

  1. If files are not present update some flags like IsFile1Found to false and pass these flags to stored procedure which truncates on conditional basis.
  2. If file is empty then Using powershell through Execute Process Task you can extract first two rows if there are two rows (header + data row) then it means data file is not empty. Then you can truncate the table and import the data. other approach could be you can load data into some staging table and from these staging table insert data to the destination table using SQL stored procedure and truncate these staging tables after data is moved to all the destination table. In this way before truncating destination table you can check if staging tables are empty or not.

Upvotes: 1

Related Questions