Reputation: 415
I have seen Dimension Lookup/Update documentation here and a few other blogs. But I cannot seem to get a clear idea. I have a table with the following structure:
Key Name Code Status IN Out Active
The key name code status active
comes from a csv file .
I need to use Dimension lookup/update step for scd type2 and populate the IN/Out
.
After setting up the connection details,
I have set the Keys to KEY
and the Fields to all the other fields with the option Date of last insert (without stream field as source)
. I need to create a new row in the database if there is change in any of the other fields. That row would have the same key
and the updated details with the out
set to infinity and the in
set to current system date.
Date Range start field is set to IN
and Table Date range End is set to OUT
column of database .
I don't understand the concept of this technical key as the key also comes from the csv file .
When I click on preview there is an error:
DB2 SQL error: SQLCODE: -407, SQLSTATE: 23502, SQLERRMC:
Please let me know for more details and any step or setting I must have missed.
Upvotes: 2
Views: 4986
Reputation: 1764
The key points of Dimension Lookup / Update
step (used in Update mode) when using for building SCD II. table are:
id
column from your source data (I guess its Key
from your CSV file). It is used to look up previously stored rows with same Key
, so the step can compare incoming row to rows already stored in SCD II. and evaluate the row was changed or not.technical_key
). Also new within your PDI record stream (named it same as it is in your table technical_key
). Set it to auto increment
. It will auto increment itself when inserting any new row into the table and its value is unique within the table (so can be used as table's primary key). last_updated_date
column from your source data which changes its value to actual date every time the row record is updated in source data. Or you can use actual time from when the transformation is executed (obtained from Get System Info
.system date
step).Date range start field, Table date range and: Every row in SCD II. table needs to have validity period (a scope within row data is valid). This period is defined by two dates - start (Date range start field
) and end date (Table date range end
). These two fields you set to IN
and Out
(name of the table columns). The step will automatically determine their values (set up Min. year = 1900
and Max. year = 2199
) using "Range start, end
" and Stream Datefield
values:
Key
) is new:
Key = 1; technical_key = 123; In = 1900-01-01; Out = 2199-12-31; Name = X
Key = 1; technical_key = 123; In = 1900-01-01; Out = 2015-03-13; Name = X
Key = 1; technical_key = 158; In = 2015-03-13; Out = 2199-12-31; Name = A
Fields - Update fields: Here you define all the data fields you want to store: Name Code Status Active
. As a Type of dimension update
set Insert
(used for SCD II. attributes).
Date of last insert (without stream field as source)
cannot be used because it only simply writes out actual date-time to particular dimension field
and you cannot define stream field
along with this option.Punch through
option can be used when you are not interested about the attribute's history (SCD I.). It overwrites all occurrences among the rows with the same Key
. E.g. Punch through
on Name
(New value 'A'):
Key = 1; technical_key = 123; In = 1900-01-01; Out = 2015-03-13; Name = A
Key = 1; technical_key = 158; In = 2015-03-13; Out = 2199-12-31; Name = A
Upvotes: 6