Reputation: 493
My goal is to search a directory recursively for all files that contain a regular expression with speed in mind. Then output to a CSV that has a column that includes the exact matches and another column shows the file they were found it. Thanks to the user woxxom, I've started playing with the IO.File
as it's apparently much faster than using Select-String
.
This is a project I've been working on for a long time and was able to accomplish via Select-String
and using Export-Csv
, but it's a rather slow process.
Any thoughts on what I'm missing with my new attempt?
$ResultsCSV = "C:\TEMP\Results.csv"
$Directory = "C:\TEMP\examples"
$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$TextFiles = Get-ChildItem $Directory -Include *.txt*,*.csv*,*.rtf*,*.eml*,*.msg*,*.dat*,*.ini*,*.mht* -Recurse
$out = [Text.StringBuilder]
foreach ($FileSearched in $TextFiles) {
$text = [IO.File]::ReadAllText($FileSearched)
foreach ($match in ([regex]$RX).Matches($text)) {
if (!(Test-Path $ResultsCSV)) {
'Matches,File Path' | Out-File $ResultsCSV -Encoding ASCII
$out.AppendLine('' + $match.value + ',' + $FileSearched.fullname)
$match.value | Out-File $ResultsCSV -Encoding ascii -Append
$FileSearched.Fullname | Out-File $ResultsCSV -Encoding ascii -Append
$out.ToString() | Out-File $ResultsCSV -Encoding ascii -Append -NoNewline
}
}
}
Upvotes: 1
Views: 7260
Reputation: 11052
You can speed performance by using Stream for Reading and writing
$ResultsCSV = "C:\TEMP\Results.csv"
$Directory = "C:\TEMP\examples"
$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$TextFiles = Get-ChildItem $Directory -Include *.txt*,*.csv*,*.rtf*,*.eml*,*.msg*,*.dat*,*.ini*,*.mht* -Recurse
$file2 = new-object System.IO.StreamWriter($ResultsCSV) #output Stream
$file2.WriteLine('Matches,File Path') # write header
foreach ($FileSearched in $TextFiles) { #loop over files in folder
# $text = [IO.File]::ReadAllText($FileSearched)
$file = New-Object System.IO.StreamReader ($FileSearched) # Input Stream
while ($text = $file.ReadLine()) { # read line by line
foreach ($match in ([regex]$RX).Matches($text)) {
# write line to output stream
$file2.WriteLine("{0},{1}",$match.Value, $FileSearched.fullname )
} #foreach $match
}#while $file
$file.close();
} #foreach
$file2.close()
Upvotes: 5