Reputation: 3922
I'm in the process of programatically building a fairly complex SSIS data flow using the EzAPI library and have hit a snag. I've attached an image of what I'm trying to achieve.
I've got the start (up to the merge join) working correctly but have come unstuck trying to map the input of the merge join component to the output.
Here's the code I have so far (just a snippet obviously)
int sortPosition;
var df = new EzDataFlow(p);
var cur = new EzOleDbSource(df);
cur.Connection = dstConn;
cur.Table = "Table1";
var hst = new EzOleDbSource(df);
hst.Connection = hstConn;
hst.Table = "Table2";
// Add all the columns to the sort transformation for the Current database table
var sortCurr = new EzSortTransform(df);
sortCurr.AttachTo(cur);
sortPosition = 1;
foreach (Column c in table.Columns)
{
sortCurr.SortOrder[c.ColumnName] = sortPosition++;
}
// Same for history
var sortHst = new EzSortTransform(df);
sortHst.AttachTo(hst);
sortPosition = 1;
foreach (Column c in table.Columns)
{
sortHst.SortOrder[c.ColumnName] = sortPosition++;
}
var mrg = new EzMergeJoin(df);
mrg.AttachTo(sortCurr, 0, 0);
mrg.AttachTo(sortHst, 0, 1);
mrg.JoinType = MergeJoinType.Full;
... now what?
i've searched high and low for an example or documentation for the EzMergeJoin transformation but with no avail. Can anyone point me in the right direction?
Upvotes: 1
Views: 4777
Reputation: 56
Unfortunately, we haven't found a way to fix the issue of join keys not being passed through the data flow. The only answer we have come up with is to select the join keys twice with different names in the upstream components. You then sort on only the duplicates, and this makes the merge join use the duplicates as the join keys. The original columns then pass through to the next component in the data flow.
Here's our code sample, nothing special once the input columns are set up correctly:
MergeJoin = new EzMergeJoin(TransformTranFact);
MergeJoin.Name = "Merge Join";
MergeJoin.AttachTo(SourceChecksum, 0, 0);
MergeJoin.AttachTo(FactTranFact, 0, 1);
MergeJoin.LinkAllInputsToOutputs();
MergeJoin.JoinType = MergeJoinType.Full;
MergeJoin.NumKeyColumns = mchs.Count();
mchs is an XElement object that contains information from a Data Warehouse design captured in Dimodelo Architect. We use EzAPI to automate the generation of complex ETL for a Data Warehouse directly from the Data Warehouse design captured in Dimodelo Architect.
Upvotes: 1
Reputation: 1
I was able to solve this by making a small modification to the LinkInputToOutput method in EzComponent. Here is the before/after:
Before:
if (InputColumnExists(inputIndex, colName))
return; // return as this column is already linked
IDTSInput100 input = m_meta.InputCollection[inputIndex];
IDTSVirtualInput100 virtualInput = input.GetVirtualInput();
IDTSVirtualInputColumn100 virtualInputColumn = virtualInput.VirtualInputColumnCollection[colName];
m_comp.SetUsageType(input.ID, virtualInput, virtualInputColumn.LineageID, DTSUsageType.UT_READONLY);
After:
IDTSInput100 input = m_meta.InputCollection[inputIndex];
IDTSVirtualInput100 virtualInput = input.GetVirtualInput();
IDTSVirtualInputColumn100 virtualInputColumn = virtualInput.VirtualInputColumnCollection[colName];
if (InputColumnExists(inputIndex, colName))
m_comp.SetUsageType(input.ID, virtualInput, virtualInputColumn.LineageID, DTSUsageType.UT_IGNORED);
m_comp.SetUsageType(input.ID, virtualInput, virtualInputColumn.LineageID, DTSUsageType.UT_READONLY);
You still need to manually link the column(s) that exist in both source components like this:
mergeJoin.LinkInputToOutput(0, "KeyField");
Upvotes: 0
Reputation: 136
I figured out a non-optimal but pragmatic solution for now. If you select the join key column twice in one of your queries but only set the sort option on one of them, it will link to the output of the merge join.
Upvotes: 0
Reputation: 61211
As an initial answer try adding
// Not sure how to handle multiple key columns
mrg.NumKeyColumns = 1;
mrg.LinkAllInputsToOutputs();
My dataflow was setup as Current
branch looks like
Historical
branch looks like
I only sorted on the initial column as my desire was to see an output like source_id, col1, col2.
I'm still looking at the code to see how to add the join key in there but haven't found it yet.
Upvotes: 0