Deluxplanet345
Deluxplanet345

Reputation: 65

How to read an excel with dynamic column names in Pentaho

I have been struggling to read an excel file whose column number and names keep on changing. I have read about Metadata Injection but unable to apply it in my scenario, I am definitely missing something. My requirement is:

  1. I have a directory where Excel files keep coming in. Job is scheduled to run 2-3 times a day to pick the excel file in the folder and then post the Excel file data into SQL Server.

  2. The challenge is that the Excel files that come in this directory may have different number of columns ..i.e the column numbers are dynamic and also the column names.

I use Microsoft Excel Input and in order to refresh the column names, every time I have to click on 'Get fields from header row' button. I want to automate it and just have one transformation which can read any excel with dynamic column names.

Screenshot

In most of the examples of ETL Metadata Injection, I notice that the column names are provided. However, in my scenario I do not know in advance the column names or even the number of columns my excel might have.

Is this possible through the ETL Metadata Injection step or any through any other way?

Any help appreciated.

Thanks, Sarthak

Upvotes: 0

Views: 1408

Answers (1)

Ana GH
Ana GH

Reputation: 1700

Are you able to get a second file apart from the excel in a text format such as CSV or a text with a special character to separate the columns, just with the headers? Or have the excel generated as a csv instead of excel.

This way you could have a transformation to read the first line or the file just with the headers to read the columns, and then inject that information to read the excel or the csv.

Using a text file instead of an excel allows you to read the first line as a text, and then use the Split field to rows step to get the columns of that first line.

Upvotes: 0

Related Questions