Reputation: 21
i have a Problem with powershell Performance while searching a 40gb log file. i Need to check if any of 1000 email adresses are included in this 40gb file. This would take 180 hours :D any ideas?
$logFolder = "H:\log.txt"
$adressen= Get-Content H:\Adressen.txt
$ergebnis = @()
foreach ($adr in $adressen){
$suche = Select-String -Path $logFolder -Pattern "\[\(\'from\'\,.*$adr.*\'\)\]" -List
$aktiv= $false
$adr
if ($suche){
$aktiv = $true
}
if ($aktiv -eq $true){
$ergebnis+=$adr + ";Ja"
}
else{
$ergebnis+=$adr + ";Nein"
}
}
$ergebnis |Out-File H:\output.txt
Upvotes: 2
Views: 216
Reputation: 399
Don't read the file 1000 times.
Build a regexp line with all 1000 addresses (it's gonna be a huge line, but hey, much smaller than 40TB). Like:
$Pattern = "\[\(\'from\'\,.*$( $adressen -join '|' ).*\'\)\]"
Then do your Select-String, and save the result to do an address-by-address search in it. Hopefully, the result will be much smaller than 40Gb, and should be much faster.
Upvotes: 1
Reputation: 3908
As mentioned in the comments, replace
$ergebnis = @()
with
$ergebnis = New-Object System.Collections.ArrayList
and
$ergebnis+=$adr + ";Ja"
with
$ergebnis.add("$adr;Ja")
or respective
$ergebnis.add("$adr;Nein")
This will speed up your script quite a bit.
Upvotes: 0