Gonzalo.-
Gonzalo.-

Reputation: 12672

SSIS - Reuse Ole DB source when matching Fact against lookup table twice

I am pretty new to SSIS and BI in general, so first of all sorry if this is a newbie question.

I have my source data for the fact table in a csv, so I want to match the ids against the surrogate keys in lookup tables.

The data structure in the csv is like this

... userId, OriginStationId, DestinyStationId,..

What I am trying to accomplish is to match the data against my lookup table. So what I am doing is

  1. Reading Lookup data using OLE DB Source
  2. Reading my csv file
  3. Sorting both inputs by the same field
  4. Doing a left join by Id, in order to get the SK

This way, if there is no match (aka can't find the surrogate key) I can redirect that to a rejected csv and handle it later.

something like this:

Join

(sorry for the spanish!)

I am doing this for each dimension, so I can handle each one with different error codes.

Since OriginStationId and DestinyStationId are two values from the same dimension (they both match against the same lookup table), I wanted to know if there's a way to avoid reading two times the data from the table (I mean, not to use two ole db sources to read twice the data from the same table).

I tried adding a second output to the sort but I am not allowed to. The same goes to adding another output from OLE DB Source.

I see there's an "cache option", is the best way to go ? (Although it would impy creating anyway another OLE DB source.. right?)

The third option I thought of was joining by the two fields, but since there is only one field in the lookup table (the same field) I am getting an error when I try to map both colums from my csv against the same column in my Lookup table

There are columns missing with the sort order 2 to 2

What is the best way to go for this ? Or I am thinking something incorrectly ? If something was not clear let me know and I'll update my question

Upvotes: 3

Views: 390

Answers (2)

John
John

Reputation: 347

Gonzalo

I have just used this article on how to derive columns for a data warehouse building:- How to Populate a Fact Table using SSIS (part 1).

Using this I built a simple package that reads a CSV file with two columns that are used to derive separate values from the same CodeTable. The CodeTable has two fields Id and Description.

The Data Flow has two "Lookup" tasks. The first one joins the attribute Lookup1 against the Description to derive its Id. The second joins the attribute Lookup2 against the Description to derive a different Id.

Here is the Data Flow:-

Data Flow

Note the "Data Conversion" was required to convert the string attributes from the CSV file into "Unicode string [DT_WSTR]" so they could be joined to the nvarchar(50) description attribute in the table.

Here is the Data Conversion:-

enter image description here

Here is the first Lookup (the second one joins "Copy of Lookup2" to the Description):-

enter image description here

Here is the Data Viewer output with the to two derived Ids CodeTableFirstId and CodeTableSecondId:-

enter image description here

Hopefully I understand your problem and this is of use to you.

Cheers John

Upvotes: 0

Tab Alleman
Tab Alleman

Reputation: 31775

Any time you wish you could have multiple outputs from a component that only allows one, all you have to do is follow that component with the Multicast component, whose sole purpose is to split a Data Flow stream into multiple outputs.

Upvotes: 3

Related Questions