Pollastre
Pollastre

Reputation: 154

PowerBI filter folder files via filename

I want to build a PowerBI report using many excel files from a folder. I want this report to be dynamic and be able to load all the files by itself.

Unfortunately, I cannot expect my folder to only contain the files I want. I can, however, expect my source filenames to match the following regex: /.*_.*_(\d{6})\.xlsx/. I cannot find a way to match the file names against the regex. The closest way I've found is to filter the files by several conditionals, like so:

= Table.SelectRows(Origen, each ([Extension] = ".xlsx") and not Text.StartsWith([Name], "~"))

However, this approach is too permissive, tedious and error prone. It may allow an unwanted file to be used as a data source.

The Problem

I want to use only the files whose name:

Again, I cannot expect the folder to only contain these types of files. I am able to filter out the unwanted files using javascript and the above regex, but I do not know how to achieve this in PowerBI. I have looked at this question from microsoft fabric but I do not fully understand how to adapt it to my problem.

Upvotes: 1

Views: 108

Answers (2)

codeape
codeape

Reputation: 100886

Here is a solution without using the (pretty hacky) Web.Page approach. Power Query/M really needs a Text.Matches function, BTW.

Create a new blank query called getMatchingFiles with the following content (in advanced editor):

(directoryName) =>
    let
        AllFiles = Folder.Files(directoryName),
        XslxFiles = Table.SelectRows(AllFiles, each [Extension]=".xlsx"),
        IncludeParts = Table.AddColumn(XslxFiles, "NameParts", each Text.Split(Text.RemoveRange([Name], Text.Length([Name])-5, 5), "_")),
        OnlyThreeParts = Table.SelectRows(IncludeParts, each List.Count([NameParts])=3),
        OnlySixDigitsInLastPart = Table.SelectRows(
            OnlyThreeParts, 
            each Text.Length(List.Last([NameParts]))=6 and Number.FromText(List.Last([NameParts]))>=0
        )
    in
        Table.RemoveColumns(OnlySixDigitsInLastPart, "NameParts")

Then use the function in a query: enter image description here

Upvotes: 1

horseyride
horseyride

Reputation: 21393

Assuming Column1 has the filenames to test

#PriorStepName = 
fRegex=(text,regex)=>Web.Page("<script>var x='"&text&"';var y=new RegExp('"&regex&"','g');var b=x.match(y);document.write(b);</script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0} ,
#"Added Custom" = Table.AddColumn(#PriorStepName, "Output",  each try fRegex([Column1], ".*_.*_(\\d{6}).xlsx") otherwise null ),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each [Output] <> null and output <> "null")
in #"Filtered Rows"

enter image description here

Upvotes: 2

Related Questions