I have a script that will look for a regex inside a large number of files, such as an address or phonenumber. The script i currently have runs as a job and works, however very slowly.
Currently my method of start-job works as expected, all be it slowly. Im looking for ways to speed up and returning results quicker. If at all possible
I have ventured into the world of Runspaces within powershell after browsing around for various help. Below is the code i have mashed together with brief understanding in the use of Runspaces.
My question is around the way that Runspaces can be used so that a Get-Childitem request running in parallel will not be scanning the same file across multiple runspaces. If this is even possible?
I created 20,000 files containing junk, and manually edited 2 files with the word "KETCHUP!" inside.
10k files are .xml 10k files are .txt
Im trying not to use PS v7 -parallel parameters as i would like to hand my script/GUI to other members of staff that are not in IT and will not have higher than ISE installed
powershell searching for a phrase in a large amount of files fast
$Finished.text = 'Working.....'
#Get list of files to search through
$path = "C:\intel\spam"
Push-Location $path
$FILES = Get-ChildItem -filter *.XML -File
### 5 Runspace limit
$RunspacePool = [RunspaceFactory]::CreateRunspacePool(1,5)
$RunspacePool.ApartmentState = "MTA"
$runspaces = @()
# Setup scriptblock
$scriptblock = {
Param (
foreach($file in $files){
$test = select-string -Path $file -Pattern 'KETCHUP!' -List | select-object FileName,Path
add-content -Path 'C:\intel\matches.txt' -Value $test.Filename
Write-Output "Starting search..."
$runspace = [PowerShell]::Create()
[void]$runspace.AddArgument($FILES) # <-- Send files to be searched
$runspace.RunspacePool = $RunspacePool
$AsyncObject = $runspace.BeginInvoke()
# Wait for runspaces to complete
while ($runspaces.Status.IsCompleted -notcontains $true) {}
# Cleanup runspaces
foreach ($runspace in $runspaces ) {
# Cleanup runspace pool
$Data = $runspace.EndInvoke($AsyncObject)
It really comes down to logic, and that would be in breaking the files to search for in chunks - which is totally doable. The way it works is more or less like this. Let's imagine you have 8 files and a hypothetical 2-core CPU:
[File1] [File2] [File3] [File4] [File5] [File6] [File7] [File8] <- All files from Get-ChildItem
After determining the chunk_size (which would be 4 in this hypothetical scenario since 8 files divided by 2 cores is 4), the code would divide these files into chunks:
Chunk 1: [File1] [File2] [File3] [File4]
Chunk 2: [File5] [File6] [File7] [File8]
This division would be stored in the $file_chunks
Index 0: [File1] [File2] [File3] [File4]
Index 1: [File5] [File6] [File7] [File8]
Now, when parallel processing begins, each CPU core (or runspace) picks up a chunk:
CPU Core 1 (Runspace 1): Processing [File1] [File2] [File3] [File4]
CPU Core 2 (Runspace 2): Processing [File5] [File6] [File7] [File8]
Each core works on its own subset of files, allowing for faster parallel processing.
With this said and done, you can create a more robust solution such as a function to re-use it in a more friendly manner:)
function Search-Files {
[string[]]$Filter = @('*.txt', '*.xml'),
$regex = [regex]::new($pattern, [System.Text.RegularExpressions.RegexOptions]::Compiled)
$file_list = [System.Collections.Generic.List[string]]::new()
$searchOption = if ($Recurse) { [System.IO.SearchOption]::AllDirectories } else { [System.IO.SearchOption]::TopDirectoryOnly }
foreach ($find in $Filter)
$file_list.AddRange([System.IO.Directory]::EnumerateFiles($Path, $find, $searchOption))
Write-Warning "An error occurred while fetching files with filter ${find}: $_"
$file_count = $file_list.Count
$cpu_count = [Environment]::ProcessorCount
$optimal_runspaces = [Math]::Min($cpu_count, $file_count)
$file_chunks = [System.Collections.Generic.List[string[]]]::new($optimal_runspaces)
$runspace_pool = [runspacefactory]::CreateRunspacePool(1, $optimal_runspaces)
$chunk_size = [Math]::Ceiling($file_count / $optimal_runspaces)
for ($i = 0; $i -lt $optimal_runspaces; $i++)
$start = $i * $chunk_size
$end = [Math]::Min(($start + $chunk_size - 1), ($file_count - 1))
$scriptblock = {
Param($files, $regex)
$results = [System.Collections.Generic.List[string]]::new()
foreach ($file in $files)
$reader = [System.IO.File]::OpenText($file)
while ($reader.Peek() -ge 0)
$line = $reader.ReadLine()
if ($regex.IsMatch($line))
if ($reader)
return $results
$runspaces = @{}
foreach ($chunk in $file_chunks)
$runspace = [powershell]::Create().AddScript($scriptblock).AddArgument($chunk).AddArgument($regex)
$runspace.RunspacePool = $runspace_pool
$runspaces[$runspace] = $runspace.BeginInvoke()
# Wait for all runspaces to complete
while ($runspaces.Values | Where-Object { -not $_.IsCompleted })
Start-Sleep -Milliseconds 100
$all_results = foreach ($runspace in $runspaces.GetEnumerator())
# Flatten the results for a single list of matched file paths
return $all_results
# Usage
$path = "C:\intel\spam"
$filter = "*.xml"
$pattern = "KETCHUP!"
$results = Search-Files -Path $path -Filter $filter -Pattern $pattern
$results | ForEach-Object { Add-Content -Path 'C:\intel\matches.txt' -Value $_ }
