Reputation: 688
Brand new to Pentaho (and a newbie SO poster so look out!)
I'd like to use Kettle/PDI to transform data coming in from an RDBMS from this (for example):
Question1 Question2 Question3 Question4 1/1/13 123.00 Test 1 Test 1.1 1/2/13 124.00 Test 2 Test 1.2 1/3/13 125.00 Test 3 Test 1.3 1/4/13 126.00 Test 4 Test 1.4 1/5/13 127.00 Test 5 Test 1.5
to this:
QuestionName AnswerDate AnswerNumber AnswerString Question1 1/1/13 Question1 1/2/13 Question1 1/3/13 Question1 1/4/13 Question1 1/5/13 Question2 123.00 Question2 124.00 Question2 125.00 Question2 126.00 Question2 127.00 Question3 Test 1 Question3 Test 2 Question3 Test 3 Question3 Test 4 Question3 Test 5 Question4 Test 1.1 Question4 Test 1.2 Question4 Test 1.3 Question4 Test 1.4 Question4 Test 1.5
As hopefully reflected above, there should be an "Answer<FieldDataType>" column for each available datatype in the original table. Is this possible with PDI? If so, can someone provide me with some pointers? I've tried using the Row Normaliser step to pivot the table and assign the new fields, but am probably not doing things quite right (or there is a bug [PDI 4.4]).
Upvotes: 7
Views: 9953
Reputation: 1
The Row Normalizer is very sensitive to the order you specify the de-normalization.
I had a sparse matrix input and discovered the following rules:
Thus if, in the example given you specified
Fieldname Type new field
Question1 date AnswerDate
Question2 number AnswerNumber
Question3 string AnswerString
Question4 string AnswerString
will work better than
Fieldname Type new field
Question1 date AnswerDate
Question3 string AnswerString
Question2 number AnswerNumber
Question4 string AnswerString
Upvotes: 0
Reputation: 1
use javascript step:
trans_Status = SKIP_TRANSFORMATION;
var row1 = createRowCopy(4);
var row2 = createRowCopy(4);
var row3 = createRowCopy(4);
var row4 = createRowCopy(4);
row1[0] = 'Question1';
row2[1] = 'Question2';
row3[2] = 'Question3';
row4[3] = 'Question4';
row1[1] = Question1;
row2[2] = Question2;
row3[3] = Question3;
row4[3] = Question4;
putRow(row1);
putRow(row2);
putRow(row3);
putRow(row4);
don't forget add fields;
Upvotes: 0
Reputation: 54
I accomplished this by using a scripting step to write an output row containing the column and value for each column in the input row. From there, I went to a Regex Evaluation step and used multiple capture groups to map the value types to additional columns in the stream. I messed around with the Row Normaliser for a while, but couldn't get it to do exactly what I wanted. The performance loss of using a scripting step was negligible.
Upvotes: 0