Michelle Santos
Michelle Santos

Reputation: 267

Removing specific rows in an Excel file using Azure Data Factory

I have a set of excel files inside ADLS. The format looks similar to the one below:

enter image description here

The first 4 rows would always be the document header information and the last 3 will be 2 empty rows and the end of the document indicator. The number of rows for the employee information is indefinite. I would like to delete the first 4 rows and the last 3 rows using ADF.

Can any help me with what should be expressions in the Derived column / Select?

Upvotes: 0

Views: 2770

Answers (2)

ASH
ASH

Reputation: 20302

How about this? Under the 'Source' tab, choose the number of lines you want to skip.

enter image description here

Upvotes: 1

All About BI
All About BI

Reputation: 533

My Excel file:

ExcelFile

Source Data set settings (give A5 in range and select first row as header): SourceDataSetProperties

Make sure to refresh schema in the source data set. Schema

After schema refresh, if you preview the source data, you will be seeing all rows from row number 5. This will include footer too which we can filter in data flow.

Preview Next, add a filter transformation with below expression

  !startsWith(sno,'dummy') && sno!=''

this will filter out the rows starting with dummy, in your case, end of document. Also we are ignoring the empty rows by checking sno!=''

Final Preview after filter:

Preview After Filtering footer

Upvotes: 3

Related Questions