Reputation: 1614
Good Afternoon Everyone,
I am working with a Storage Area Network (SAN) that has approximately 10TB of data. I need to perform a recursive directory listing to identify specific types of files (e.g., PST files). Currently, I'm using PowerShell's Get-ChildItem -Include command, but it's exceedingly slow—taking days to complete the task.
I found a compiled code resource here that seems relevant. Could someone provide guidance on how to implement this in my scenario?
Any suggestions or insights on speeding up this process would be greatly appreciated! If anyone could point me in the direction on how to use the compiled code from HERE I should be good too.
Thanks to the wonderful @not2qubit for finding the GetFiles method of the [System.IO.Directory]
class, we have a significantly faster way to locate files in large directories with a good amount of limiting criteria.
[System.IO.Directory]::GetFiles(
'C:\', # [Str] Root Search Directory
'cmd.exe', # [Str] File Name Pattern
[System.IO.EnumerationOptions] @{
AttributesToSkip = @(
'Hidden'
'Device'
# 'Temporary'
'SparseFile'
'ReparsePoint'
# 'Compressed'
'Offline'
'Encrypted'
'IntegrityStream'
# 'NoScrubData'
)
BufferSize = 4096 # [Int] Default=4096
IgnoreInaccessible = $True # [Bool] True=Ignore Inaccessible Directories
MatchCasing = 0 # [Int] 0=PlatformDefault; 1=CaseSensitive; 2=CaseInsensitive
MatchType = 0 # [Int] 0=Simple; 1=Advanced
MaxRecursionDepth = 2147483647 # [Int] Default=2147483647
RecurseSubdirectories = $True # [Bool]
ReturnSpecialDirectories = $False # [Bool] $True=Return the special directory entries "." and "..";
}
)
[System.IO.Directory]::GetFiles
Maximum Minimum Average
------- ------- -------
5.782s 5.082s 5.385s
Get-Childitem
Maximum Minimum Average
------- ------- -------
21.647s 17.556s 19.907s
Function Start-PerformanceTest {
<#
.SYNOPSIS
Test the execution time of script blocks.
.DESCRIPTION
Perform an accurate measurement of a block of code over a number of itterations allowing informed decisions to be made about code efficency.
.PARAMETER ScriptBlock
[ScriptBlock] Code to run and measure. Input code as either a ScriptBlock object or wrap it in {} and the script will attempt to convert it automatically.
.PARAMETER Measurement
[String] Ime interval in which to display measurements. (Options: Milliseconds, Seconds, Minutes, Hours, Days)
.PARAMETER Itterations
[Int] Numbers of times to run the code.
.INPUTS
None
.OUTPUTS
None
.NOTES
VERSION DATE NAME DESCRIPTION
___________________________________________________________________________________________________________
1.0 20 August 2020 Warilia, Nicholas R. Initial version
Credits:
(1) Script Template: https://gist.github.com/9to5IT/9620683
#>
[CmdletBinding()]
param (
[Parameter(Mandatory)]
[ScriptBlock]$ScriptBlock,
[ValidateSet('Milliseconds', 'Seconds', 'Minutes', 'Hours', 'Days')]
$Measurement = 'Seconds',
[int]$Iterations = 100
)
$Results = [System.Collections.ArrayList]::new()
For ($I = 0; $I -le $Iterations; $I++) {
[Void]$Results.Add(
((Measure-Command -Expression ([scriptblock]::Create($ScriptBlock)) | Select-Object TotalDays, TotalMinutes, TotalSeconds, TotalMilliseconds))
)
}
#Determine correct timestamp label
Switch ($Measurement) {
'Milliseconds' { $LengthType = 'ms' }
default { $LengthType = $Measurement.SubString(0, 1).tolower() }
}
$Results | Group-Object Total$Measurement | Measure-Object -Property Name -Average -Maximum -Minimum | Select-Object `
@{Name = 'Maximum'; Expression = { "$([Math]::Round($_.Maximum,3))$LengthType" } },
@{Name = 'Minimum'; Expression = { "$([Math]::Round($_.Minimum,3))$LengthType" } },
@{Name = 'Average'; Expression = { "$([Math]::Round($_.Average,3))$LengthType" } }
}
Write-Host "Testing: System.IO.Directory.GetFiles"
Start-PerformanceTest -Iterations:10 -ScriptBlock:{
[System.IO.Directory]::GetFiles(
'C:\', # [Str] Root Search Directory
'cmd.exe', # [Str] File Name Pattern
[System.IO.EnumerationOptions] @{
AttributesToSkip = @(
'Hidden'
'Device'
# 'Temporary'
'SparseFile'
'ReparsePoint'
# 'Compressed'
'Offline'
'Encrypted'
'IntegrityStream'
# 'NoScrubData'
)
BufferSize = 4096 # [Int] Default=4096
IgnoreInaccessible = $True # [Bool] True=Ignore Inaccessible Directories
MatchCasing = 0 # [Int] 0=PlatformDefault; 1=CaseSensitive; 2=CaseInsensitive
MatchType = 0 # [Int] 0=Simple; 1=Advanced
MaxRecursionDepth = 2147483647 # [Int] Default=2147483647
RecurseSubdirectories = $True # [Bool]
ReturnSpecialDirectories = $False # [Bool] $True=Return the special directory entries "." and "..";
}
)
}
Write-Host 'Testing: Get-ChildItem'
Start-PerformanceTest -Iterations:10 -ScriptBlock:{
Get-ChildItem -Path:'C:\' -Filter:'cmd.exe' -Recurse -File -ErrorAction SilentlyContinue |
Where-Object {
# Filter out files based on specified attributes
# Note: Some attributes might not directly correspond to EnumerationOptions and need manual filtering
!($_.Attributes -band [System.IO.FileAttributes]::Hidden) -and
!($_.Attributes -band [System.IO.FileAttributes]::Device) -and
!($_.Attributes -band [System.IO.FileAttributes]::SparseFile) -and
!($_.Attributes -band [System.IO.FileAttributes]::ReparsePoint) -and
!($_.Attributes -band [System.IO.FileAttributes]::Offline) -and
!($_.Attributes -band [System.IO.FileAttributes]::Encrypted) -and
!($_.Attributes -band [System.IO.FileAttributes]::IntegrityStream)
}
}
Upvotes: 5
Views: 6815
Reputation: 16997
After reading this answer and testing, I have come to the conclusion that the following is the fastest possible way to find files and directories. Using .NET provide roughly 1/6th the time compared to a regular Get-ChildItem
query.
(Measure-Command { [IO.Directory]::GetFiles('C:\', 'cmd.exe', [IO.EnumerationOptions] @{AttributesToSkip='Hidden,Device,Temporary,SparseFile,ReparsePoint,Compressed,Encrypted'; RecurseSubdirectories=$true; IgnoreInaccessible=$true }) }).TotalSeconds
The complete list of attributes can be found here, and summarized below with their numerical values.
0 None
1 ReadOnly
* 2 Hidden
4 System
16 Directory
32 Archive
* 64 Device
128 Normal
* 256 Temporary
* 512 SparseFile
* 1024 ReparsePoint
* 2048 Compressed
* 4096 Offline
8192 NotContentIndexed
* 16384 Encrypted
* 32768 IntegrityStream
* 131072 NoScrubData
Only items marked with *
can be used when finding normal files.
For example, trying to include Directory
, didn't work...
Here's your copy/paste list for AttributesToSkip:
'Hidden,Device,Temporary,SparseFile,ReparsePoint,Compressed,Encrypted'
And in number format:
(2,16,256,512,1024,2048,4096,16384,32768,131072)
Apparently it should also be possible to use the numbers directly with:
[System.IO.FileAttributes] (2, 4, 1024, 512)
Upvotes: 1
Reputation: 7087
I know this is a day late and a dollar short but you can use robocopy for this purpose and it will list paths longer than 255 chars:
robocopy <SourceRoot> <DummyDestinationDir> /MIR /FP /NC /NS /NDL /NJH /NJS /LOG:<LogFilePath> /L
I know it's pretty wordy but robocopy is very quick compared to PowerShell, though I don't know how it would stack up against cmd's dir. You can either redirect std out using ">" or site the /LOG: parameter like above. I would test to see which is faster. Note do not use the /TEE option console output slows robocopy down in my experience. Also note the file paths will be indented in the output but this is easily rectified with a text editor than can trim leading and/or trailing whitespace.
Upvotes: 8
Reputation: 126842
If it's just one extension that you're after use the Filter parameter, it's much faster than -Include. I'd also suggest to use PowerShell 3 is you can (get-childitem has the new -file switch), as far as I remember listing UNC paths performance was enhanced in it (with underlying .net 4 support).
Another option would be to use the dir command from a cmd window, should be very fast.
Upvotes: 4
Reputation: 68303
As Shay sys, Powwershell V3 is much better than v2.
If you just want a list of the file's fullnames, the legecy dir command with a /B (bare) switch is still faster than get-childitem
cmd /c dir <root path> /B /S /A-D
Upvotes: 4