Lego Man
Lego Man

Reputation: 13

What makes this PowerShell script so slow to read lines in a file?

Here is the script I am using, the file passed in is about 500mb

$file=$args[0]

If ($args[1] -eq 'response') {
$results = Select-String -Path $file -Pattern "(?<=sent: ).+(?= type)" | Select -Expand Matches | Select -Expand Value
}

If ($args[1] -eq 'blocked') {
$results = Select-String -Path $file -Pattern "(?<=: ).+(?= ->)" | Select -Expand Matches | Select -Expand Value
}

If ($args[1] -eq 'clients') {
$results = Select-String -Path $file -Pattern "(?<=:\d\d ).+(?= \[)" | Select -Expand Matches | Select -Expand Value
}

$results | Group-Object | Select-Object Name,Count | Sort-Object Count -Descending

Is there a faster way to get this same data out? I'm not married to PowerShell by any means.

Upvotes: 0

Views: 1038

Answers (1)

mjolinor
mjolinor

Reputation: 68341

I'd trade select-string for Get-Content, with a ReadCount of 1000-5000, then use -match as an array operator against the resulting line arrays. Feed the string matches to a hash table accumulator to get the counts.

Not tested.

$file=$args[0]
$ht = @{}

If ($args[1] -eq 'response') {
$results = Get-Content $file -ReadCount 1000 |
  foreach-object {
   $_ -match "(?<=sent: ).+(?= type)" |
    ForEach-Object { $ht[$_]++ }
  }
 }

If ($args[1] -eq 'blocked') {
$results = Get-Content $file -ReadCount 1000 |
  foreach-object {
   $_ -match  "(?<=: ).+(?= ->)"|
    ForEach-Object { $ht[$_]++ }
  }
}

If ($args[1] -eq 'clients') {
$results = Get-Content $file -ReadCount 1000 |
  foreach-object {
   $_ -match  "(?<=:\d\d ).+(?= \[)"|
    ForEach-Object { $ht[$_]++ }
  } 
}

$results.GetEnumerator() | Sort-Object Value -Descending

Upvotes: 1

Related Questions