DerKamiKatze
DerKamiKatze

Reputation: 37

Extract parts of first row in CSV and create columns

Following situation:

The "Metadata" has the following schema:

String1 - String2 - INT1/INT2/INT3 / String3 - String4 - String5 (String6) String7

Sample:

A B C D E F G H I J K L
"Metadata"
Header1 Header2 Header3 Header4 Header5 Header6 Header7 Header8 Header9 Header10 Header11 Header12
"Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data"
"Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data"

My Goals:

Create an Data Flow:

So at the end there is an table with column headers from row 2, data beginning from row 3, three new columns based on parts of "Metadata" and an dynamic filename based on parts from "Metadata":

String6.String7_String3.csv:

Header1 Header2 Header3 Header4 Header5 Header6 Header7 Header8 Header9 Header10 Header11 Header12 "AdditionalColumn1" "AdditionalColumn2" "AdditionalColumn3"
"Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "String3" "String6" "String7"
"Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "Data" "String3" "String6" "String7"

Upvotes: 1

Views: 1499

Answers (1)

Joseph  Xu
Joseph Xu

Reputation: 6043

update:
I've created a test. It can also solve the problem.
This is my data source: enter image description here

My debug result of my test is as follows, it will generate one csv file:
enter image description here

  1. The data source is to the csv file, please don't select First row as header. enter image description here

The First row as header is set false: enter image description here

  1. Then at the SurrogateKey1 activity, I set the values as follows:
    enter image description here

It will add a row_no column, in the form of a number incremented by 1: enter image description here

  1. Then I use ConditionalSplit1 activity to split the source data to 3 types data flow: metadata, headers, data. enter image description here

  2. At the Select1 activity, I only select _col0_ and name as column0. enter image description here The data preview is like this: enter image description here

  3. Then I use DerivedColumn1 activity to create 3 new columns:

ADDITIONALCOLUMN1 => split(split({_col0_},'-')[3],'/')[4]
ADDITIONALCOLUMN2 => split(split(split({_col0_},'-')[5],'(')[2],')')[1]
ADDITIONALCOLUMN3 => split(split(split({_col0_},'-')[5],'(')[2],')')[2]

enter image description here The data preview is like this: enter image description here

  1. At the Select2 activity, I only selected the 3 new columns. enter image description here The 3 new columns data preview is like this:
    enter image description here

  2. At DerivedColumn2 activity, I created the FileName column: concat(ADDITIONALCOLUMN2,'.',ADDITIONALCOLUMN3,'_',ADDITIONALCOLUMN1,'.csv') enter image description here The data preview is like this: enter image description here

  3. At Join2 activity, I use Cutomer(cross) and the Right stream is DerivedColumn1, so we can add the data we generated at DerivedColumn2 activity at the tail. enter image description here The data preview is like this: enter image description here

  4. At DerivedColumn3 activity, I created 3 columns manually (since the Union activity will be done next, the number of columns must be the same). The FileName column expression is concat(ADDITIONALCOLUMN2,'.',ADDITIONALCOLUMN3,'_',ADDITIONALCOLUMN1,'.csv') enter image description here
    The data preview is like this:
    enter image description here

  5. At Select6 activity, we can select the columns we need.
    The setting is as follows: enter image description here
    The data preview is like this: enter image description here

  6. Then we jump to Join1 activity, the setting is as follows: enter image description here The data preview is like this, we can see Custom (cross) will add the data we generated at DerivedColumn2 activity at the tail: enter image description here

  7. At Select4 activity, I selected all the columns: enter image description here The data preview is like this: enter image description here

13.Then we jump to Union1 activity: enter image description here The data preview is like this: enter image description here

  1. At Select5 activity(This step can be omitted, skip to sink1 activity directly), I selected all the columns. We can skip this step. enter image description here

  2. At sink1 activity. The setting is as follows, it will generate a csv file with this name: enter image description here The data preview is like this: enter image description here

That's all.

Upvotes: 2

Related Questions