Reputation: 73
I have multiple steps in my transformation where i join data from different streams (files). All streams have a common field which is ID_Time.
Now when i try to join 2 streams on ID_Time, lets say using Multiway Merge Join, I get fields ID_Time and ID_Time_1. I'd like to merge those into one, so we get rid of the _1 column. If we do not have the same columns in both streams, I would like the ones that we do not have to be null.
An example:
ID Number1 Number2 ID_Time
1 5,06 215 12154
2 5121 121 151
ID CarModel CarManufacturer ID_Time
1 CX3 Mazda 12
2 V40 Volvo 39
So i would like this as the end result:
ID Number1 Number2 CarModel CarManufacturer ID_Time
1 5,06 215 null null 12154
2 5121 121 null null 151
3 null null CX3 Mazda 12
4 null null V40 Volvo 39
Maybe I have to use a different join? Or on different keys?
Upvotes: 0
Views: 2020
Reputation: 4544
The output of both Merge join
and Multiway Merge join
will always include all fields from all streams; if fields are repeated they will be suffixed with _1
, _2
, etc.
One obvious case is when the join keys have the same name (which if often the case), but you may also have other fields with the same name.
There are several ways to go about it, but the easiest would be a Select values
step after the join, and remove the duplicated fields (one of the tabs of Select Values is called "Remove", it will remove any fields you specify).
Your final Id field can be added after the merge, with a Add sequence
step.
Beware: always sort the inputs before joining, as both merge steps assume the input data to be sorted.
Upvotes: 1