Reputation: 834
I have two sources resulting from some transformation in data flow:
I have tried using join, it replicates the data no matter join I select it outputs similar stuff:
I have tried union as well but union either creates null in columns (if done by name) or rows (if done by position)
Shouldnt the join just concat the columns together because the IDs are same in both table.
This is how the desired ouput should look: I want concat the version column to the first source so that it looks like this:
ID name value version
111 file1 0.1 3
111 file2 0.82 15
111 file3 2.2 2
Upvotes: 0
Views: 1692
Reputation: 5074
Both of your source files have only one matching column (ID) and it is not unique. When you join both sources on the ID column, each row of source1 joins with all the matching rows of source2.
Here, your row1 (111) of source1 joins with all 3 matching rows (111) of source2, hence it results in 9 rows with different version values for each row in source1.
To get only 3 rows as your expected results, you need a unique matching row in each source.
rowNumber()
based on the ID column.Source1->window1:
Window1 data preview:
Source2->window2:
Window2 data preview:
Join data preview:
Select data preview:
Upvotes: 1
Reputation: 3838
That is expected with a join. For example, when you join tables in SQL, you also supply the target projection as part of the select statement. What you need to do here is add a Select transformation after your Join transformation. In there, you will reduce the projection to just the columns that would like to retain. You'll be able to choose which side (left or right) you would like to keep for the ID column.
Upvotes: 0