Tomaz Leopold
Tomaz Leopold

Reputation: 73

Pentaho kettle join streams with different fields

I have multiple steps in my transformation where i join data from different streams (files). All streams have a common field which is ID_Time.

Now when i try to join 2 streams on ID_Time, lets say using Multiway Merge Join, I get fields ID_Time and ID_Time_1. I'd like to merge those into one, so we get rid of the _1 column. If we do not have the same columns in both streams, I would like the ones that we do not have to be null.

An example:

ID     Number1 Number2  ID_Time 
 1      5,06     215     12154
 2      5121     121      151


  ID    CarModel CarManufacturer ID_Time
  1       CX3         Mazda        12
  2       V40         Volvo        39

So i would like this as the end result:

ID Number1 Number2 CarModel CarManufacturer ID_Time
 1   5,06    215     null         null       12154
 2   5121    121     null         null        151
 3   null    null    CX3         Mazda        12
 4   null    null    V40         Volvo        39

Maybe I have to use a different join? Or on different keys?

Upvotes: 0

Views: 2020

Answers (1)

nsousa
nsousa

Reputation: 4544

The output of both Merge join and Multiway Merge join will always include all fields from all streams; if fields are repeated they will be suffixed with _1, _2, etc.

One obvious case is when the join keys have the same name (which if often the case), but you may also have other fields with the same name.

There are several ways to go about it, but the easiest would be a Select values step after the join, and remove the duplicated fields (one of the tabs of Select Values is called "Remove", it will remove any fields you specify).

Your final Id field can be added after the merge, with a Add sequence step.

Beware: always sort the inputs before joining, as both merge steps assume the input data to be sorted.

Upvotes: 1

Related Questions