Reputation: 7419
So, I have a bunch of data that I'm trying to import using SSIS. The problem I'm having is that some of the data is outdated. So I want to only import the most recent data. I have a key that indicates which set of data each row belongs and I only want to import the most row per key.
What is the best way to do this in SSIS?
My only thought would be to use two sort transform. The first would sort by date. The second would sort by my key, and eliminate duplicate rows. This would only work if the sort was guaranteed to maintain the previous order. Does anyone know if this holds true? Or does the second sort completely eliminate order the first sort put into place?
Upvotes: 3
Views: 652
Reputation: 89
I usually split (multicast) my dataset : one to aggregate the value I want to keep, the other one is used to merge with the first dataset. For example, I have an history of position by employee (Employee, Date, Position)
I split my dataset to retrieve the last history date by employee (aggregate employee and max date) and I sort it by employee => 1.Employee + 1.last_date I merge my 2 dataset => 1.Employee = 2.Employee AND 1.last_date = 2.date
Upvotes: 0
Reputation: 89661
I don't think you can rely on the sort order. You can sort by multiple keys in a single sort - perhaps sending it through a script task at that point to do the filtering by simply comparing it to the previous row.
Upvotes: 1