Reputation: 33
I wrote this script to search a lot of text files (~100,000) for 4 different search criteria and export to 4 separate files, I thought it would be more efficient to perform all 4 searches on each file as it is loaded vs doing 4 full searches like the first iteration below does. I may be missing some other major inefficiencies as I am pretty new to powershell.
I have this script re written from the first version to the second, but can't figure out how to get the path and data to display together like the first version did. I am struggling to reference the object within the loop, and have pieced this second version together, which is working, but not giving me the path to the file which is necessary.
It seems like I am just missing one or two little things to get me going in the right direction. Thanks in advance for your help
1st version:
Get-ChildItem -Filter *.txt -Path "\\file\to\search" -Recurse | Select-String -Pattern "abc123" -Context 0,3 | Out-File -FilePath "\\c:\out.txt"
Get-ChildItem -Filter *.txt -Path "\\file\to\search2" -Recurse | Select-String -Pattern "abc124" -Context 0,3 | Out-File -FilePath "\\c:\out2.txt"
Get-ChildItem -Filter *.txt -Path "\\file\to\search3" -Recurse | Select-String -Pattern "abc125" -Context 0,3 | Out-File -FilePath "\\c:\out3.txt"
Get-ChildItem -Filter *.txt -Path "\\file\to\search4" -Recurse | Select-String -Pattern "abc126" -Context 0,3 | Out-File -FilePath "\\c:\out4.txt"
Output:
\\file\that\was\found\example.txt:84: abc123
\\file\that\was\found\example.txt:90: abc123
\\file\that\was\found\example.txt:91: abc123
2nd version:
##$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ Configuration $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
############################################ Global Parameters #############################################
$SearchPath="\\file\to\search"
$ProgressFile=""\\progress\file\ResultsCount.txt"
$records = 105325
##----------------------------------------- End Global Parameters -----------------------------------------
########################################### Search Parameters ##############################################
##Search Pattern 1
$Pattern1="abc123"
$SaveFile1="\\c:\out.txt"
##Search Pattern 2
$Pattern2="abc124"
$SaveFile2="\\c:\out2.txt"
##Search Pattern 3
$Pattern3= "abc125"
$SaveFile3= "\\c:\out3.txt"
##Search Pattern 4
$Pattern4= "abc126"
$SaveFile4="\\c:\out4.txt"
##Search Pattern 5
$Pattern5= ""
$SaveFile5=""
##----------------------------------------- End Search Parameters ------------------------------------------
##$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ End of Config $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
############################### SCRIPT #####################################################################
## NOTES
## ------
##$files=Get-ChildItem -Filter *.txt -Path $SearchPath -Recurse ## Set all files to variable #### Long running, needs to be a better way #######
##$records=$files.count ## Set record #
Get-ChildItem -Filter *.txt -Path $SearchPath -Recurse | Foreach-Object { ## loop through search folder
$i=$i+1 ## increment record
##
Get-Content $_.FullName | Select-String -Pattern $Pattern1 -Context 0,3 | Out-File -FilePath $SaveFile1 ## pattern1 search
Get-Content $_.FullName | Select-String -Pattern $Pattern2 | Out-File -FilePath $SaveFile2 ## pattern2 search
Get-Content $_.FullName | Select-String -Pattern $Pattern3 -Context 0,1 | Out-File -FilePath $SaveFile3 ## pattern3 search
Get-Content $_.FullName | Select-String -Pattern $Pattern4 -Context 0,1 | Out-File -FilePath $SaveFile4 ## pattern4 search
##Get-Content $_.FullName | Select-String -Pattern $Pattern5 -Context 0,1 | Out-File -FilePath $SaveFile5 ## pattern5 search (Comment out unneeded search lines like this one)
$progress ="Record $($i) of $($records)" ## set progress
Write-Host "Record $($i) of $($records)" ## Writes progress to window
$progress | Out-File -FilePath $ProgressFile ## progress file
} ##
############################################################################################################
Output:
abc123
abc123
abc123
Edit: Also I am trying to figure out a good way to not have to hard code in the number of records for a decent progress readout, I commented out the way I thought would work (1st & 2nd line of the script), but there needs to be a more efficient way than rerunning the same search twice, one for a count and one for the for loop.
I would be very interested in any runtime efficiency information you could provide.
Upvotes: 3
Views: 607
Reputation: 23663
To compliment the answers @Santiago Squarzon and Lee_Dailey, I think you were actually on the good way yourself knowing that the Group-Object
cmdlet is pretty expensive especially in memory usage as it chokes the PowerShell pipeline causing all the search results to be piled up in memory.
Besides, the Select-String cmdlet supports multiple (-SimpleMatch
) patterns, where concatenating the search patters with an |
(-join '|'
) will force you to use an (escaped) regular expression.
To continue on your approach:
(note that in the example, I am using my own settings to search through my script files)
$ProgressFile = '.\ResultsCount.txt'
$SearchRoot = '..\'
$Filter = '*.ps1'
$Searches = @{
'Null' = '.\Null.txt'
'Test' = '.\Test.txt'
'Object' = '.\Object.txt'
}
$Files = Get-ChildItem -Filter $Filter -Path $SearchRoot -Recurse
$Total = $Files.count
$Searches.Values |ForEach-Object { Set-Content -LiteralPath $_ -Value '' }
$i = 0
ForEach ($File in $Files) {
Get-Content -LiteralPath $File.FullName |
Select-String @($Searches.Keys) -AllMatches |ForEach-Object {
$Value = '{0}:{1}:{2}' -f $File.FullName, $_.LineNumber, $_
Add-Content -LiteralPath $Searches[$_.Pattern] -Value $Value
}
'Record {0} of {1}' -f ++$i, $Total |Tee-Object -Append .\ProgressFile.txt
}
$Searches = @{ ...
Maps the search patters with the files, you might also use a PSObject
list to specify each search (where you could add columns with e.g. context start/end values, etc.)
$Searches.Values |ForEach-Object { Set-Content -LiteralPath $_ -Value '' }
Empties the result files (knowing that they are not part of the main stream you can't use Add-Content
)
$i = 0
Unfortunately there is no automatic index that initializes with a foreach
loop (yet, see: #13772
Automatic variable for the pipeline index)
Get-Content -LiteralPath $File.FullName
Load the content once into memory
Note1: this is a string array.
Note2: the $Content
will be reused each iteration and therefore overwrites the previous one and unloads it from memory
Select-String @($Searches.Keys) -AllMatches |ForEach-Object {
Searches the string array using your (multiple) defined patterns. (you might consider to use the -SimpleMatch
parameter if your search strings contain special characters.)
Note: Unfortunately you need to embedded the $Searches.Keys
in a array subexpression operator @( )
, for details see .Net issue: #56835
Make OrderedDictionaryKeyValueCollection implement IList
$Value = '{0}:{1}:{2}' -f $File.FullName, $_.LineNumber, $_
Build an result output string.
Note: the result of the Select-String
does have a (hidden) LineNumber
and (matched) Pattern
property.
Add-Content -LiteralPath $Searches[$_.Pattern] -Value $Value
Add the result string to the specific mapped output file.
'Record {0} of {1}' -f $i++, $Total |Tee-Object -Append .\ProgressFile.txt
Tee-Object
will write the progress to the standard output (display) and also to the specific file.
Upvotes: 0
Reputation: 7479
[edit - thanks to mklement0 for pointing out the errors about speed and the -SimpleMatch
switch. [grin]]
the Select-String
cmdlet will accept a -Path
parameter ... and it is FAR [i was thinking of Get-Content
, not Get-ChidItem
] faster than using Get-ChildItem
to feed the files to S-S
. [grin]
also, the -Pattern
parameter accepts a regex OR
pattern like Thing|OtherThing|YetAnotherThing
- and it accepts simple string patterns if you use the -SimpleMatch
switch parameter.
what the code does ...
Select-String
with a path and an array of strings to search forGroup-Object
and a calculated property to group the matches by the last part of .Line
property from the S-S
callat that point, you can use the .Name
property of each GroupInfo
to select the items to send out to each file AND to build your file names.
the code ...
$SourceDir = 'D:\Temp\zzz - Copy'
$FileSpec = '*.log'
$SD_FileSpec = Join-Path -Path $SourceDir -ChildPath $FileSpec
$TargetPatternList = @(
'Accordion Cajun Zydeco'
'better-not-be-there'
'Piano Rockabilly Rowdy'
)
$GO_Results = Select-String -Path $SD_FileSpec -SimpleMatch $TargetPatternList |
Group-Object -Property {$_.Line.Split(':')[-1]}
$GO_Results
output ...
Count Name Group
----- ---- -----
6 Accordion Cajun Zydeco {D:\Temp\zzz - Copy\Grouping-List_08-02.log:11:Accordion Cajun Zydeco, D:\Temp\zzz - Copy\Grouping-List_08-09.log:11:Accordion Cajun Zy...
6 Bawdy Dupe Piano Rocka... {D:\Temp\zzz - Copy\Grouping-List_08-02.log:108:Bawdy Dupe Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:108:Bawdy...
6 Bawdy Piano Rockabilly... {D:\Temp\zzz - Copy\Grouping-List_08-02.log:138:Bawdy Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:138:Bawdy Pian...
6 Dupe Piano Rockabilly ... {D:\Temp\zzz - Copy\Grouping-List_08-02.log:948:Dupe Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:948:Dupe Piano ...
6 Instrumental Piano Roc... {D:\Temp\zzz - Copy\Grouping-List_08-02.log:1563:Instrumental Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:1563:I...
6 Piano Rockabilly Rowdy {D:\Temp\zzz - Copy\Grouping-List_08-02.log:1781:Piano Rockabilly Rowdy, D:\Temp\zzz - Copy\Grouping-List_08-09.log:1781:Piano Rockabil...
note that the .Group
contains an array of lines from the matches sent out by the S-S
call. you can send that to your output file.
Upvotes: 3
Reputation: 60045
Here is my take at solving this problem, very similar to Lee_Dailey's nice answer but with a foreach
loop. I would recommend investing some time into researching the multi-threading options available on PowerShell in case you need to increase the performance of the script, you can look specifically at the ThreadJob module by Microsoft which is really easy to use or if you can't install modules due to some work policy, you can use Runspace.
It is worth adding that you can use the -List
switch on Select-String
, this way the performance of the script would be increased even more:
-List
Only the first instance of matching text is returned from each input file. This is the most efficient way to retrieve a list of files that have contents matching the regular expression.
$map = @{
abc123 = 'C:\out_abc123.txt'
abc124 = 'C:\out_abc124.txt'
abc125 = 'C:\out_abc125.txt'
}
$pattern = $map.Keys -join '|'
$match = foreach($file in Get-ChildItem *.txt)
{
Select-String -LiteralPath $file.FullName -Pattern $pattern
}
$match | Group-Object { $_.Matches.Value } | ForEach-Object {
$_.Group | Select-Object Path, LineNumber, Line | Out-File $map[$_.Name]
}
Upvotes: 2