stluis
stluis

Reputation: 21

Powershell Performance

i have a Problem with powershell Performance while searching a 40gb log file. i Need to check if any of 1000 email adresses are included in this 40gb file. This would take 180 hours :D any ideas?

$logFolder = "H:\log.txt"
$adressen= Get-Content H:\Adressen.txt
$ergebnis = @()

foreach ($adr in $adressen){
    $suche =  Select-String -Path $logFolder -Pattern "\[\(\'from\'\,.*$adr.*\'\)\]" -List
    $aktiv= $false
    $adr
    if ($suche){
        $aktiv = $true 
    }

    if ($aktiv -eq $true){
        $ergebnis+=$adr + ";Ja"
    }
    else{
        $ergebnis+=$adr + ";Nein"
    }
}
$ergebnis |Out-File H:\output.txt

Upvotes: 2

Views: 216

Answers (2)

Xavier Plantefève
Xavier Plantefève

Reputation: 399

Don't read the file 1000 times.

Build a regexp line with all 1000 addresses (it's gonna be a huge line, but hey, much smaller than 40TB). Like:

$Pattern  = "\[\(\'from\'\,.*$( $adressen -join '|' ).*\'\)\]" 

Then do your Select-String, and save the result to do an address-by-address search in it. Hopefully, the result will be much smaller than 40Gb, and should be much faster.

Upvotes: 1

TobyU
TobyU

Reputation: 3908

As mentioned in the comments, replace

$ergebnis = @()

with

$ergebnis = New-Object System.Collections.ArrayList

and

$ergebnis+=$adr + ";Ja"

with

$ergebnis.add("$adr;Ja")

or respective

$ergebnis.add("$adr;Nein")

This will speed up your script quite a bit.

Upvotes: 0

Related Questions