Rasika Vijay
Rasika Vijay

Reputation: 415

Pentaho Dimension lookup/update

I have seen Dimension Lookup/Update documentation here and a few other blogs. But I cannot seem to get a clear idea. I have a table with the following structure:

 Key Name Code Status IN Out Active

The key name code status active comes from a csv file . I need to use Dimension lookup/update step for scd type2 and populate the IN/Out.

After setting up the connection details, I have set the Keys to KEY and the Fields to all the other fields with the option Date of last insert (without stream field as source). I need to create a new row in the database if there is change in any of the other fields. That row would have the same key and the updated details with the out set to infinity and the in set to current system date.

Date Range start field is set to IN and Table Date range End is set to OUT column of database .

I don't understand the concept of this technical key as the key also comes from the csv file .

When I click on preview there is an error:

   DB2 SQL error: SQLCODE: -407, SQLSTATE: 23502, SQLERRMC: 

Please let me know for more details and any step or setting I must have missed.

Upvotes: 2

Views: 4986

Answers (1)

mzy
mzy

Reputation: 1764

The key points of Dimension Lookup / Update step (used in Update mode) when using for building SCD II. table are:

  • Keys - Key fields: Here you define an id column from your source data (I guess its Key from your CSV file). It is used to look up previously stored rows with same Key, so the step can compare incoming row to rows already stored in SCD II. and evaluate the row was changed or not.
  • Technical key field: Technical key is extra new column you need to add to your table (e.g. technical_key). Also new within your PDI record stream (named it same as it is in your table technical_key). Set it to auto increment. It will auto increment itself when inserting any new row into the table and its value is unique within the table (so can be used as table's primary key).
  • Stream Datefield: Usually here you put last_updated_date column from your source data which changes its value to actual date every time the row record is updated in source data. Or you can use actual time from when the transformation is executed (obtained from Get System Info.system date step).
  • Date range start field, Table date range and: Every row in SCD II. table needs to have validity period (a scope within row data is valid). This period is defined by two dates - start (Date range start field) and end date (Table date range end). These two fields you set to IN and Out (name of the table columns). The step will automatically determine their values (set up Min. year = 1900 and Max. year = 2199) using "Range start, end" and Stream Datefield values:

    • When the row (identified by Key) is new:
      • Key = 1; technical_key = 123; In = 1900-01-01; Out = 2199-12-31; Name = X
    • E.g. Next day. The same row updated (Stream Datefield's value = '2015-03-13'):
      • Key = 1; technical_key = 123; In = 1900-01-01; Out = 2015-03-13; Name = X
      • Key = 1; technical_key = 158; In = 2015-03-13; Out = 2199-12-31; Name = A
  • Fields - Update fields: Here you define all the data fields you want to store: Name Code Status Active. As a Type of dimension update set Insert (used for SCD II. attributes).

    • The Date of last insert (without stream field as source) cannot be used because it only simply writes out actual date-time to particular dimension field and you cannot define stream field along with this option.
    • The Punch through option can be used when you are not interested about the attribute's history (SCD I.). It overwrites all occurrences among the rows with the same Key. E.g. Punch through on Name (New value 'A'):
      • Key = 1; technical_key = 123; In = 1900-01-01; Out = 2015-03-13; Name = A
      • Key = 1; technical_key = 158; In = 2015-03-13; Out = 2199-12-31; Name = A

Pentaho Dimension lookup/update

Upvotes: 6

Related Questions