Vanbuskirk
Vanbuskirk

Reputation: 23

ADF - How to copy an Excel Sheet with Multiple Sheets into separate .csv files

I currently have an Excel file that has multiple worksheets (over 11). This Excel file currently lives in a remote file server. I am trying to use Azure Data FactoryV2 to copy the Excel file and split each worksheet as its own .csv file within an ADLS Gen2 folder. The reason for this is because not every tab has the same schema and I want to only select the valid ones later.

I currently have an ADF dataset pointing to the Excel dataset correctly and have created a parameter for the sheet name using @dataset.SheetName. I am not sure where to go next. After creating a new pipeline I have tried nesting a Copy Activity inside a ForEach activity, however, it asks for the SheetName value.

How do I construct this pipeline to grab the names of the worksheets existing in the Excel file and then iterate a copy activity for each sheet? I cannot assume I will know the sheet names or how many sheets there will be. I would prefer to avoid creating multiple datasets for the Excel file if possible.

Any insight would be appreciated.

Upvotes: 2

Views: 6337

Answers (1)

Joseph  Xu
Joseph Xu

Reputation: 6043

Get list of Excel sheet names in ADF is not support yet and you can vote here.

  1. So you can use azure funcion to get the sheet names.
import pandas

xl = pandas.ExcelFile('data.xlsx')

# see all sheet names
print(xl.sheet_names )

enter image description here

  1. Then use an Array type variable in ADF to get and traverse this array. enter image description here

Upvotes: 1

Related Questions