Reputation: 49
How joiner improves performance when we take less data source as master and more records source as detailed. How this joiner builds the cache and why we call joiner txn as blocking transformation. anyone pls clarify
Upvotes: 0
Views: 515
Reputation: 7387
Why master should contain less rows -
the Integration service reads all the records from the master source and builds index and data caches. After building the caches, the it reads records from the detail source and performs joins with the cache.
Which means, keeping number of master rows low is a good idea because your cache size and time to create will be minimum.
Why joiner blocks the pipeline is also answered above. It has to read, cache all master rows. Then it reads all details rows. Which means, unless all rows from master and details are read, joiner will not pass the data. Thus blocking the pipeline.
This is the behaviors for unsorted input. For sorted data, index and data cache will be created differently and based on join condition and index cache which makes things faster.
Upvotes: 1