Reputation: 37
Following situation:
The "Metadata" has the following schema:
String1 - String2 - INT1/INT2/INT3 / String3 - String4 - String5 (String6) String7
Sample:
A | B | C | D | E | F | G | H | I | J | K | L |
---|---|---|---|---|---|---|---|---|---|---|---|
"Metadata" | |||||||||||
Header1 | Header2 | Header3 | Header4 | Header5 | Header6 | Header7 | Header8 | Header9 | Header10 | Header11 | Header12 |
"Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" |
"Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" |
My Goals:
Create an Data Flow:
So at the end there is an table with column headers from row 2, data beginning from row 3, three new columns based on parts of "Metadata" and an dynamic filename based on parts from "Metadata":
String6.String7_String3.csv:
Header1 | Header2 | Header3 | Header4 | Header5 | Header6 | Header7 | Header8 | Header9 | Header10 | Header11 | Header12 | "AdditionalColumn1" | "AdditionalColumn2" | "AdditionalColumn3" |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
"Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "String3" | "String6" | "String7" |
"Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "Data" | "String3" | "String6" | "String7" |
Upvotes: 1
Views: 1499
Reputation: 6043
update:
I've created a test. It can also solve the problem.
This is my data source:
My debug result of my test is as follows, it will generate one csv file:
The First row as header is set false:
It will add a row_no column, in the form of a number incremented by 1:
Then I use ConditionalSplit1 activity to split the source data to 3 types data flow:
metadata
, headers
, data
.
At the Select1 activity, I only select _col0_
and name as column0
.
The data preview is like this:
Then I use DerivedColumn1 activity to create 3 new columns:
ADDITIONALCOLUMN1 => split(split({_col0_},'-')[3],'/')[4]
ADDITIONALCOLUMN2 => split(split(split({_col0_},'-')[5],'(')[2],')')[1]
ADDITIONALCOLUMN3 => split(split(split({_col0_},'-')[5],'(')[2],')')[2]
The data preview is like this:
At the Select2 activity, I only selected the 3 new columns.
The 3 new columns data preview is like this:
At DerivedColumn2 activity, I created the FileName column:
concat(ADDITIONALCOLUMN2,'.',ADDITIONALCOLUMN3,'_',ADDITIONALCOLUMN1,'.csv')
The data preview is like this:
At Join2 activity, I use Cutomer(cross) and the Right stream is DerivedColumn1, so we can add the data we generated at DerivedColumn2 activity at the tail.
The data preview is like this:
At DerivedColumn3 activity, I created 3 columns manually (since the Union activity will be done next, the number of columns must be the same). The FileName column expression is
concat(ADDITIONALCOLUMN2,'.',ADDITIONALCOLUMN3,'_',ADDITIONALCOLUMN1,'.csv')
The data preview is like this:
At Select6 activity, we can select the columns we need.
The setting is as follows:
The data preview is like this:
Then we jump to Join1 activity, the setting is as follows:
The data preview is like this, we can see Custom (cross) will add the data we generated at DerivedColumn2 activity at the tail:
At Select4 activity, I selected all the columns:
The data preview is like this:
13.Then we jump to Union1 activity:
The data preview is like this:
At Select5 activity(This step can be omitted, skip to sink1 activity directly), I selected all the columns. We can skip this step.
At sink1 activity. The setting is as follows, it will generate a csv file with this name:
The data preview is like this:
That's all.
Upvotes: 2