Austin Wismer
Austin Wismer

Reputation: 281

Using PowerQuery to pull multiple worksheets from multiple workbooks within multiple subdirectories

Essentially, I'm stuck with a hierarchy of .xls workbooks that I would like to compile using PowerQuery into a PowerPivot data model.

Master Directory - Retailer Folder #1 - Brand #1 Workbook - SKU Worksheet #1 - SKU Worksheet #2 - Brand #2 Workbook - Retailer Folder #2 - Brand #1 Workbook - SKU Worksheet #1 - SKU Worksheet #2 - Brand #2 Workbook

Is there an easy way to do this using PowerQuery?

One complication: every workbook contains 2 summary tabs at the end that would need to be excluded. Similarly, every folder contains a summary workbook that is also an aggregation that would need to be excluded.

Upvotes: 0

Views: 673

Answers (2)

Mike Honey
Mike Honey

Reputation: 15017

Personally I prefer to avoid functions, as I find them harder to debug & maintain. Instead I just use From Folder and filter to the files I want, then Add a Custom Column using the formula:

= Excel.Workbook ( [Contents] )

Then you can expand this into sheets and tables, filter that list and expand the Excel.Workbook.Data column to get to cells of data. You can keep enough of the metadata columns to identify the source of each row (which folder / file / sheet).

This keeps the your maintenance of the whole Query within the PQ UI. When issues occur (which is inevitable with Excel sources), you can step through the whole process and isolate the problem.

I only use functions if there will be multiple calls from different source queries. That doesn't sound like the case here.

Upvotes: 2

It should be. You can use From Folder (Folder.Files) to access the Excel workbooks and filter out the summary workbooks. Then, for the remaining workbooks, you can create a function which, when given an Excel workbook, goes to the worksheet you want and removes the last two column. You can add a custom column which then calls this function.

Upvotes: 0

Related Questions