justSaid
justSaid

Reputation: 1640

Reading and splitting a CSV File with Talend

The csv file contains more than one table, it might look like this:

 "Table 1"
 ,
 "id","visits","downloads","emailsent"
  1, 4324, 23, 2
  2, 664, 42, 1
  3, 73, 44, 0
  4, 914, 8, 0
  ...

 "Table 2"
 ,
 "id_of_2nd_tab","visits_of_2nd_tab","downloads_of_2nd_tab"
  1, 524, 3
  2, 564, 52
  3, 63, 84
  4, 814, 8
  ...

What is the best way to import those tables into Talend ?

Upvotes: 0

Views: 1827

Answers (2)

dbh
dbh

Reputation: 1627

Generally, that kind of multi-record format CSV format is more complex to parse.

Question : Are there are a finite number of tables? Question: Does each table have a fixed number and order of columns? Question: What is the separator between "tables" within the CSV?

I believe you need to take a multi-pass approach. You could do something like this.

Pass #1 - Use tFileInputDelimited Use row separator such as "Table", No field separator, Grab 1 big field Alternatively, you could split the first file into separate files at this stage.

Pass #2 - Split Row (on results from pass #1) on the Row Separator "\r\n" etc Split it into multiple rows but of a single column.

Pass #3 - Extract Delimited Fields (on results from pass #2) Extract based on a field separator Recognize a "Table" row Recognize a "Header row Additional handling per Table / set of fields in header

Upvotes: 1

PatVictorio
PatVictorio

Reputation: 41

Use the a tFileInputExcel component to read each worksheet. Then you can use tMap to join the worksheets into a target column layout assuming you want do some processing on a joined set of columns.

Upvotes: 0

Related Questions