Nick W.
Nick W.

Reputation: 1614

Powershell fastest directory list

Good Afternoon Everyone,

I am working with a Storage Area Network (SAN) that has approximately 10TB of data. I need to perform a recursive directory listing to identify specific types of files (e.g., PST files). Currently, I'm using PowerShell's Get-ChildItem -Include command, but it's exceedingly slow—taking days to complete the task.

Requirement(s):

  1. The solution should significantly reduce the time required for the directory listing, ideally cutting it down to a few hours.
  2. I only require the list of file paths; extracting file properties is not necessary.

Questions:

  1. Are there any more efficient methods or tools (preferably PowerShell or Windows CMD based) that could expedite this recursive directory listing on a large-scale data set?

Side Note(s):

I found a compiled code resource here that seems relevant. Could someone provide guidance on how to implement this in my scenario? Any suggestions or insights on speeding up this process would be greatly appreciated! If anyone could point me in the direction on how to use the compiled code from HERE I should be good too.

Final Result


Thanks to the wonderful @not2qubit for finding the GetFiles method of the [System.IO.Directory] class, we have a significantly faster way to locate files in large directories with a good amount of limiting criteria.

    [System.IO.Directory]::GetFiles(
        'C:\',                                        # [Str] Root Search Directory
        'cmd.exe',                                    # [Str] File Name Pattern
        [System.IO.EnumerationOptions] @{
            AttributesToSkip         = @(
                'Hidden'
                'Device'
                # 'Temporary'
                'SparseFile'
                'ReparsePoint'
                # 'Compressed'
                'Offline'
                'Encrypted'
                'IntegrityStream' 
                # 'NoScrubData'
            )
            BufferSize               = 4096           # [Int]  Default=4096
            IgnoreInaccessible       = $True          # [Bool] True=Ignore Inaccessible Directories
            MatchCasing              = 0              # [Int]  0=PlatformDefault; 1=CaseSensitive; 2=CaseInsensitive
            MatchType                = 0              # [Int]  0=Simple; 1=Advanced
            MaxRecursionDepth        = 2147483647     # [Int]  Default=2147483647
            RecurseSubdirectories    = $True          # [Bool] 
            ReturnSpecialDirectories = $False         # [Bool] $True=Return the special directory entries "." and "..";
        }
    )

Results

[System.IO.Directory]::GetFiles
Maximum Minimum Average
------- ------- -------
5.782s  5.082s  5.385s

Get-Childitem
Maximum Minimum Average
------- ------- -------
21.647s 17.556s 19.907s

Full Test Code


Function Start-PerformanceTest {
    <#
        .SYNOPSIS
            Test the execution time of script blocks.
        .DESCRIPTION
            Perform an accurate measurement of a block of code over a number of itterations allowing informed decisions to be made about code efficency. 
        .PARAMETER ScriptBlock
            [ScriptBlock] Code to run and measure. Input code as either a ScriptBlock object or wrap it in {} and the script will attempt to convert it automatically.
        .PARAMETER Measurement
            [String] Ime interval in which to display measurements. (Options: Milliseconds, Seconds, Minutes, Hours, Days)
        .PARAMETER Itterations
            [Int] Numbers of times to run the code.
        
        .INPUTS
            None
        .OUTPUTS
            None
        .NOTES
        VERSION     DATE            NAME                        DESCRIPTION
        ___________________________________________________________________________________________________________
        1.0         20 August 2020  Warilia, Nicholas R.        Initial version
        Credits:
            (1) Script Template: https://gist.github.com/9to5IT/9620683
    #>

    [CmdletBinding()]
    param (
        [Parameter(Mandatory)]
        [ScriptBlock]$ScriptBlock,
        [ValidateSet('Milliseconds', 'Seconds', 'Minutes', 'Hours', 'Days')]
        $Measurement = 'Seconds',
        [int]$Iterations = 100
    )

    $Results = [System.Collections.ArrayList]::new()

    For ($I = 0; $I -le $Iterations; $I++) {
        [Void]$Results.Add(
            ((Measure-Command -Expression ([scriptblock]::Create($ScriptBlock)) | Select-Object TotalDays, TotalMinutes, TotalSeconds, TotalMilliseconds))
        )
    }

    #Determine correct timestamp label
    Switch ($Measurement) {
        'Milliseconds' { $LengthType = 'ms' }
        default { $LengthType = $Measurement.SubString(0, 1).tolower() }
    }

    $Results | Group-Object Total$Measurement | Measure-Object -Property Name -Average -Maximum -Minimum | Select-Object `
    @{Name = 'Maximum'; Expression = { "$([Math]::Round($_.Maximum,3))$LengthType" } },
    @{Name = 'Minimum'; Expression = { "$([Math]::Round($_.Minimum,3))$LengthType" } },
    @{Name = 'Average'; Expression = { "$([Math]::Round($_.Average,3))$LengthType" } }
}

Write-Host "Testing: System.IO.Directory.GetFiles"
Start-PerformanceTest -Iterations:10 -ScriptBlock:{
    [System.IO.Directory]::GetFiles(
        'C:\',                                        # [Str] Root Search Directory
        'cmd.exe',                                    # [Str] File Name Pattern
        [System.IO.EnumerationOptions] @{
            AttributesToSkip         = @(
                'Hidden'
                'Device'
                # 'Temporary'
                'SparseFile'
                'ReparsePoint'
                # 'Compressed'
                'Offline'
                'Encrypted'
                'IntegrityStream' 
                # 'NoScrubData'
            )
            BufferSize               = 4096           # [Int]  Default=4096
            IgnoreInaccessible       = $True          # [Bool] True=Ignore Inaccessible Directories
            MatchCasing              = 0              # [Int]  0=PlatformDefault; 1=CaseSensitive; 2=CaseInsensitive
            MatchType                = 0              # [Int]  0=Simple; 1=Advanced
            MaxRecursionDepth        = 2147483647     # [Int]  Default=2147483647
            RecurseSubdirectories    = $True          # [Bool] 
            ReturnSpecialDirectories = $False         # [Bool] $True=Return the special directory entries "." and "..";
        }
    )
}

Write-Host 'Testing: Get-ChildItem'
Start-PerformanceTest -Iterations:10 -ScriptBlock:{
    Get-ChildItem -Path:'C:\' -Filter:'cmd.exe' -Recurse -File -ErrorAction SilentlyContinue |
    Where-Object {
        # Filter out files based on specified attributes
        # Note: Some attributes might not directly correspond to EnumerationOptions and need manual filtering
        !($_.Attributes -band [System.IO.FileAttributes]::Hidden) -and
        !($_.Attributes -band [System.IO.FileAttributes]::Device) -and
        !($_.Attributes -band [System.IO.FileAttributes]::SparseFile) -and
        !($_.Attributes -band [System.IO.FileAttributes]::ReparsePoint) -and
        !($_.Attributes -band [System.IO.FileAttributes]::Offline) -and
        !($_.Attributes -band [System.IO.FileAttributes]::Encrypted) -and
        !($_.Attributes -band [System.IO.FileAttributes]::IntegrityStream)
    }
}

Upvotes: 5

Views: 6815

Answers (4)

not2qubit
not2qubit

Reputation: 16997

After reading this answer and testing, I have come to the conclusion that the following is the fastest possible way to find files and directories. Using .NET provide roughly 1/6th the time compared to a regular Get-ChildItem query.

(Measure-Command { [IO.Directory]::GetFiles('C:\', 'cmd.exe', [IO.EnumerationOptions] @{AttributesToSkip='Hidden,Device,Temporary,SparseFile,ReparsePoint,Compressed,Encrypted'; RecurseSubdirectories=$true; IgnoreInaccessible=$true }) }).TotalSeconds

The complete list of attributes can be found here, and summarized below with their numerical values.

    0       None                
    1       ReadOnly            
*   2       Hidden              
    4       System              
    16      Directory           
    32      Archive             
*   64      Device              
    128     Normal              
*   256     Temporary           
*   512     SparseFile          
*   1024    ReparsePoint        
*   2048    Compressed          
*   4096    Offline             
    8192    NotContentIndexed   
*   16384   Encrypted           
*   32768   IntegrityStream     
*   131072  NoScrubData         

Only items marked with * can be used when finding normal files.
For example, trying to include Directory, didn't work...

Here's your copy/paste list for AttributesToSkip:
'Hidden,Device,Temporary,SparseFile,ReparsePoint,Compressed,Encrypted'

And in number format:
(2,16,256,512,1024,2048,4096,16384,32768,131072)

Apparently it should also be possible to use the numbers directly with:

[System.IO.FileAttributes] (2, 4, 1024, 512)

Upvotes: 1

Steven
Steven

Reputation: 7087

I know this is a day late and a dollar short but you can use robocopy for this purpose and it will list paths longer than 255 chars:

robocopy <SourceRoot> <DummyDestinationDir> /MIR /FP /NC /NS /NDL /NJH /NJS /LOG:<LogFilePath> /L

I know it's pretty wordy but robocopy is very quick compared to PowerShell, though I don't know how it would stack up against cmd's dir. You can either redirect std out using ">" or site the /LOG: parameter like above. I would test to see which is faster. Note do not use the /TEE option console output slows robocopy down in my experience. Also note the file paths will be indented in the output but this is easily rectified with a text editor than can trim leading and/or trailing whitespace.

Upvotes: 8

Shay Levy
Shay Levy

Reputation: 126842

If it's just one extension that you're after use the Filter parameter, it's much faster than -Include. I'd also suggest to use PowerShell 3 is you can (get-childitem has the new -file switch), as far as I remember listing UNC paths performance was enhanced in it (with underlying .net 4 support).

Another option would be to use the dir command from a cmd window, should be very fast.

Upvotes: 4

mjolinor
mjolinor

Reputation: 68303

As Shay sys, Powwershell V3 is much better than v2.

If you just want a list of the file's fullnames, the legecy dir command with a /B (bare) switch is still faster than get-childitem

cmd /c dir <root path> /B /S /A-D

Upvotes: 4

Related Questions