Reputation: 20090
I have a data source where each row has five fields:
company name; year; code; value;
In my target output row model I want to produce a row like so
company name;year;value1;value2;value3;value4
Where value1,value..N are not concatenation for a single code but rather a "mapping". I.e. code 50 => "Total Revenues"
So I need to perform the following logic:
This is a in-memory map reduce with about 1M rows. How should this be handled in Talend Open Studio for Data Integration?
Upvotes: 2
Views: 2455
Reputation: 433
You could do it like that, and it's essentially the approach Maira Bay already suggested:
tFixedFlowInput
for that. You'd probably have to read from a file.tSortRow
.tMap
the value of each line to the corresponding column in the result line with a guard clause like input.code.equals("code for this column") ? input.value : null
.tAggregateRow
, grouping by company name and year, selecting the first value for each of the value rows - but make sure to ignore the nulls.I tried that with some sample data, hence the tFixedFlowInput
in step 1, and it worked for me on my machine in TOS 6.3.1.
Beware: the solution proposed assumes you only got one value per combination of company name, year and code.
Upvotes: 2
Reputation: 300
select all the records for the same company name / year
You might want to use tAggregate
(https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide521EN/18.1+tAggregateRow) to group the flow by company name and year
apply some custom java logic that performs the mapping between my codes and my fields of the output row.
Talend has a component called tMap
that allows you to map input fields into output fields.
In your tMap
you can use something like:
(assuming that input
is the name of the flow into your tMap
and output
is the name of your flow out of your tMap
)
In output.field1
put input.code == 50? input.value : 0
In output.field2
put input.code == 60? input.value : 0
In output.field2
put input.code == 70? input.value : 0
etc
This is assuming you are ok with leaving the field
columns with 0
if the value
was for another code
.
If you want the value
for each code
to be in a different output row out of the tMap
you can use a logic similar to the above, only putting each test (code == 70? input.value : 0
) in a different output table, and then filtering out the rows that have 0
(using a tFilter
) after the tMap
.
To add output tables you can use the +
symbol on the top right of the tMap
.
See here for more detials on how to use tMap
: https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide54EN/tMap
I hope this helps!
Upvotes: 2
Reputation: 1144
See solution below which I believe will fulfill your precise requirement of taking a delimited file data source and transforming it into a denormalized out as specified above.
First I mocked up a file with the same format as you specified. I made the values a logical concatenation of Company, Year, and sequence. This makes it easy to verify the output.
Next I use that as an input, run it thru a sorter, then denormalize on the value field. Finally you can see the output in a tLogRow
.
I also included the component view of tDenormalize
so you can see how that is done. You can use this technique in any falvor of Talend Open Studio
.
Upvotes: 0