Extract parts of first row in CSV and create columns

Question

Following situation:

I have an .csv file as input
The .csv is an export from an tool
The export generates an "Metadata" entry in row1, column1
Row headers of the data are in row2
data starts from row 3

The "Metadata" has the following schema:

String1 - String2 - INT1/INT2/INT3 / String3 - String4 - String5 (String6) String7

Sample:

A	B	C	D	E	F	G	H	I	J	K	L
"Metadata"
Header1	Header2	Header3	Header4	Header5	Header6	Header7	Header8	Header9	Header10	Header11	Header12
"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"
"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"

My Goals:

Create an Data Flow:

Extract the "Metadata", split it up by the Delimiters "/", " - ", "(" and ")" and make the each parts usable in derived columns and sink filename
Create three new columns, with column headers based on "String3", "String6" and "String7"
Create dynamic sink filename based on "Metadata": "String6"."String7"_"String3".csv

So at the end there is an table with column headers from row 2, data beginning from row 3, three new columns based on parts of "Metadata" and an dynamic filename based on parts from "Metadata":

String6.String7_String3.csv:

Header1	Header2	Header3	Header4	Header5	Header6	Header7	Header8	Header9	Header10	Header11	Header12	"AdditionalColumn1"	"AdditionalColumn2"	"AdditionalColumn3"
"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"String3"	"String6"	"String7"
"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"Data"	"String3"	"String6"	"String7"

Joseph Xu · Accepted Answer

update:
I've created a test. It can also solve the problem.
This is my data source:

My debug result of my test is as follows, it will generate one csv file:

The data source is to the csv file, please don't select First row as header.

The First row as header is set false:

Then at the SurrogateKey1 activity, I set the values as follows:

It will add a row_no column, in the form of a number incremented by 1:

Then I use ConditionalSplit1 activity to split the source data to 3 types data flow: metadata, headers, data.
At the Select1 activity, I only select _col0_ and name as column0. The data preview is like this:
Then I use DerivedColumn1 activity to create 3 new columns:

ADDITIONALCOLUMN1 => split(split({_col0_},'-')[3],'/')[4]
ADDITIONALCOLUMN2 => split(split(split({_col0_},'-')[5],'(')[2],')')[1]
ADDITIONALCOLUMN3 => split(split(split({_col0_},'-')[5],'(')[2],')')[2]

The data preview is like this:

At the Select2 activity, I only selected the 3 new columns. The 3 new columns data preview is like this:
At DerivedColumn2 activity, I created the FileName column: concat(ADDITIONALCOLUMN2,'.',ADDITIONALCOLUMN3,'_',ADDITIONALCOLUMN1,'.csv') The data preview is like this:
At Join2 activity, I use Cutomer(cross) and the Right stream is DerivedColumn1, so we can add the data we generated at DerivedColumn2 activity at the tail. The data preview is like this:
At DerivedColumn3 activity, I created 3 columns manually (since the Union activity will be done next, the number of columns must be the same). The FileName column expression is concat(ADDITIONALCOLUMN2,'.',ADDITIONALCOLUMN3,'_',ADDITIONALCOLUMN1,'.csv')
The data preview is like this:
At Select6 activity, we can select the columns we need.
The setting is as follows:
The data preview is like this:
Then we jump to Join1 activity, the setting is as follows: The data preview is like this, we can see Custom (cross) will add the data we generated at DerivedColumn2 activity at the tail:
At Select4 activity, I selected all the columns: The data preview is like this:

13.Then we jump to Union1 activity: The data preview is like this:

At Select5 activity(This step can be omitted, skip to sink1 activity directly), I selected all the columns. We can skip this step.
At sink1 activity. The setting is as follows, it will generate a csv file with this name: The data preview is like this:

That's all.

Extract parts of first row in CSV and create columns

Answers (1)

Related Questions