MrMr
MrMr

Reputation: 493

Use PowerShell to Quickly Search Files for Regex and Output to CSV

My goal is to search a directory recursively for all files that contain a regular expression with speed in mind. Then output to a CSV that has a column that includes the exact matches and another column shows the file they were found it. Thanks to the user woxxom, I've started playing with the IO.File as it's apparently much faster than using Select-String.

This is a project I've been working on for a long time and was able to accomplish via Select-String and using Export-Csv, but it's a rather slow process.

Any thoughts on what I'm missing with my new attempt?

$ResultsCSV = "C:\TEMP\Results.csv"
$Directory = "C:\TEMP\examples"
$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$TextFiles = Get-ChildItem $Directory -Include *.txt*,*.csv*,*.rtf*,*.eml*,*.msg*,*.dat*,*.ini*,*.mht* -Recurse
$out = [Text.StringBuilder]

foreach ($FileSearched in $TextFiles) {
    $text = [IO.File]::ReadAllText($FileSearched)
    foreach ($match in ([regex]$RX).Matches($text)) {
        if (!(Test-Path $ResultsCSV)) {
            'Matches,File Path' | Out-File $ResultsCSV -Encoding ASCII
            $out.AppendLine('' + $match.value + ',' + $FileSearched.fullname)
            $match.value | Out-File $ResultsCSV -Encoding ascii -Append
            $FileSearched.Fullname | Out-File $ResultsCSV -Encoding ascii -Append
            $out.ToString() | Out-File $ResultsCSV -Encoding ascii -Append -NoNewline
       }
    }
}

Upvotes: 1

Views: 7260

Answers (1)

M.Hassan
M.Hassan

Reputation: 11052

You can speed performance by using Stream for Reading and writing

    $ResultsCSV = "C:\TEMP\Results.csv"
    $Directory = "C:\TEMP\examples"
    $RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"

    $TextFiles = Get-ChildItem $Directory -Include *.txt*,*.csv*,*.rtf*,*.eml*,*.msg*,*.dat*,*.ini*,*.mht* -Recurse

     $file2 =  new-object System.IO.StreamWriter($ResultsCSV) #output Stream
     $file2.WriteLine('Matches,File Path') # write header

    foreach ($FileSearched in $TextFiles) {   #loop over files in folder

        #    $text = [IO.File]::ReadAllText($FileSearched)
        $file = New-Object System.IO.StreamReader ($FileSearched)  # Input Stream

        while ($text = $file.ReadLine()) {      # read line by line
            foreach ($match in ([regex]$RX).Matches($text)) {   
                   # write line to output stream
                   $file2.WriteLine("{0},{1}",$match.Value, $FileSearched.fullname )  
            } #foreach $match
        }#while $file
         $file.close();  
    } #foreach  
    $file2.close()

Upvotes: 5

Related Questions