Ashar
Ashar

Reputation: 3065

Improve on search and archive time in GitHub Actions PowerShell

Below is my workflow on PowerShell that searches for files and folders provided as a comma-separated list to itemsToInclude:

      $zipFileName = "${{ github.workspace }}\package-$env:GITHUB_RUN_NUMBER.zip"

      cd "${{ github.workspace }}"

      $itemsToInclude = $env:INPUTS_FILESINZIP

      Write-Host "itemsToInclude is- $itemsToInclude"

 

      if (-not (Test-Path $zipFileName)) {

        $null = New-Item $zipFileName -ItemType File

      }

      $workspace = "${{ github.workspace }}"

      # Define the directories to exclude

      $excludeDirectories = @('DevOps')
      $excludeExtensions = @('.java', '.class')


      # Include specific files and folders as per the comma-separated list

      Write-Host 'Include specific files and folders as per the comma-separated list'

      $itemsToIncludeList = $itemsToInclude -split ','
      $filesToInclude = Get-ChildItem $workspace -Recurse -File  -Exclude $excludeDirectories | Where-Object {
      $itemName = $_.Name

      Write-Host "Checking file: $itemName"

      $itemsToIncludeList -contains $itemName

      }

 
      $filesToInclude | ForEach-Object {

        $newZipEntrySplat = @{

          EntryPath   = $_.FullName.Substring($workspace.Length)
          SourcePath  = $_.FullName
          Destination = $zipFileName

          }

          Write-Host "Adding file: $($_.FullName)"
          New-ZipEntry @newZipEntrySplat

        }

      Write-Host "Zip file created: $zipFileName"

    env:
      INPUTS_FILESINZIP: ${{ inputs.filesinzip }}

The time it takes to search for desired files and include them in ZIP is more than acceptable.

Thus, I wish to exclude the folder DevOps and all files having extensions .java and .class so the time taken for this step is reduced.

Unfortunately, the -Exclude option does not work and I can see all files inside the AreDevOps folder listed in the output for Checking file:

Can you please suggest?

Upvotes: 0

Views: 57

Answers (1)

mklement0
mklement0

Reputation: 439822

What you're looking for is to exclude an entire directory subtree from enumeration from a recursive Get-ChildItem call with -Exclude.

Unfortunately, this is not directly supported in Windows PowerShell and still not as of PowerShell (Core) 7.4:

  • The -Include and -Exclude parameters operate on item (file or directory) names only (not on paths).

  • They only operate on the matching items themselves. That is, if a directory's name matches, its subtree is still recursed into.

GitHub issue #15159 is a feature request to also support excluding the entire subtrees of matching subdirectories.


Workarounds:

If the subdirectories whose subtrees you want to exclude are all top-level, i.e. immediate child items of the target directory, you can use a two-step approach:

$filesToInclude = 
  Get-ChildItem $workspace -Exclude $excludeDirectories |
  Get-ChildItem -Recurse -File -Exclude $excludeExtensions
  • The first Get-ChildItem call returns only top-level items that do not match the name, thereby excluding the directories of interest.

  • The second call then only recurses on the non-excluded items, using filename-extension exclusions.


If you need to exclude the subtrees of directories matching given names on any level of the input subtree, you will need post-filtering, which results in much slower execution:

# Construct a regex from the exclusion patterns.
# NOTE: The individual patterns must be
#   * either: *literal* names, such as 'DevOps'
#     * for [ and ] to be used *literally*, escape them as \[ and \]
#   * or: *regexes* rather than *wildcard* patterns; e.g.:
#     * instead of 'Foo*', use 'Foo.*?'
#     * instead of 'Foo?', use 'Foo.'
$regex = 
  '(?<=^|[\\/])(?:{0})(?=[\\/]|$)' -f ($excludeDirectories -join '|')

$filesToInclude =
  & {
    # Output the target directory itself, alone.
    Get-Item $workspace  
    # Recurse over subdirectories only and exclude matching subtrees.
    Get-ChildItem -Recurse -Directory $workspace |
      Where-Object { $_.FullName -notmatch $regex } 
  } | # Now enumerate all files in the non-excluded directories.
  Get-ChildItem -Recurse -File -Exclude $excludeExtensions
  • The approach is a two-step one again:

    • First, enumerate subdirectories only, and exclude the unwanted subtrees, resulting in a list of directories of interest only.

    • Then, in each directory of interest, recursively look for files of interest.

    • Note: The assumption is that there are (far) fewer directories than files, so that first eliminating entire subdirectory trees is more efficient than walking the entire tree and having to examine each file's path.

  • The use of a regex is needed to rule out false positives, and also enables more efficient matching with multiple exclusions, due to using only a single -notmatch operation.

    • For an explanation of the regex and the option to experiment with it, see this regex101.com page:
      • The linked page demonstrates a regex based on the following array of exclusion patterns: $excludeDirectories = 'DevOps', 'obj', 'bin'
      • Among the sample input paths shown, the ones with a pattern highlighted are the ones that the -notmatch operation would eliminate.

Upvotes: 1

Related Questions