Reputation: 688

Converting Row Columns to Rows with Pentaho Kettle/PDI

Brand new to Pentaho (and a newbie SO poster so look out!)

I'd like to use Kettle/PDI to transform data coming in from an RDBMS from this (for example):

Question1   Question2   Question3   Question4
1/1/13      123.00      Test 1      Test 1.1
1/2/13      124.00      Test 2      Test 1.2
1/3/13      125.00      Test 3      Test 1.3
1/4/13      126.00      Test 4      Test 1.4
1/5/13      127.00      Test 5      Test 1.5

to this:

QuestionName AnswerDate AnswerNumber AnswerString
Question1    1/1/13     
Question1    1/2/13     
Question1    1/3/13     
Question1    1/4/13     
Question1    1/5/13     
Question2               123.00
Question2               124.00
Question2               125.00
Question2               126.00
Question2               127.00
Question3                             Test 1
Question3                             Test 2
Question3                             Test 3
Question3                             Test 4
Question3                             Test 5
Question4                             Test 1.1
Question4                             Test 1.2
Question4                             Test 1.3
Question4                             Test 1.4
Question4                             Test 1.5

As hopefully reflected above, there should be an "Answer<FieldDataType>" column for each available datatype in the original table. Is this possible with PDI? If so, can someone provide me with some pointers? I've tried using the Row Normaliser step to pivot the table and assign the new fields, but am probably not doing things quite right (or there is a bug [PDI 4.4]).

Upvotes: 7

Answers (3)

user3304285

Reputation: 1

The Row Normalizer is very sensitive to the order you specify the de-normalization.

I had a sparse matrix input and discovered the following rules:

The Type values must be grouped together, like with like
The new field column must be in the same order for each Type grouping
The Type groups must be arranged as most populous first, least populous last

Thus if, in the example given you specified

 Fieldname  Type   new field
 Question1  date   AnswerDate
 Question2  number AnswerNumber
 Question3  string AnswerString
 Question4  string AnswerString

will work better than

 Fieldname  Type   new field
 Question1  date   AnswerDate
 Question3  string AnswerString
 Question2  number AnswerNumber
 Question4  string AnswerString

Upvotes: 0

kamapcuc

Reputation: 1

use javascript step:

trans_Status = SKIP_TRANSFORMATION;
var row1 = createRowCopy(4); 
var row2 = createRowCopy(4); 
var row3 = createRowCopy(4); 
var row4 = createRowCopy(4); 
row1[0] = 'Question1';
row2[1] = 'Question2';
row3[2] = 'Question3';
row4[3] = 'Question4';
row1[1] = Question1;
row2[2] = Question2;
row3[3] = Question3;
row4[3] = Question4;
putRow(row1);
putRow(row2);
putRow(row3);
putRow(row4);

don't forget add fields;

Upvotes: 0

nevern02

Reputation: 54

I accomplished this by using a scripting step to write an output row containing the column and value for each column in the input row. From there, I went to a Regex Evaluation step and used multiple capture groups to map the value types to additional columns in the stream. I messed around with the Row Normaliser for a while, but couldn't get it to do exactly what I wanted. The performance loss of using a scripting step was negligible.

Upvotes: 0

Converting Row Columns to Rows with Pentaho Kettle/PDI

Answers (3)

Related Questions