Callie J
Callie J

Reputation: 31306

Understanding performance impact of "write-output"

I'm writing a Powershell script (PS version 4) to parse and process IIS log files, and I've come across an issue I don't quite understand: write-output seems to add significant processing time to the script. The core of it is this (there is more, but this demonstrates the issue):

$file = $args[0]
$linecount = 0
$start = [DateTime]::Now
$reader = [IO.File]::OpenText($file)
while ($reader.Peek() -ge 0) {
    $line = $reader.ReadLine()
    $linecount++

    if (0 -eq ($linecount % 10000)) {
        $batch = [DateTime]::Now
        [Console]::Error.WriteLine("    Processed $linecount lines ($file) in $(($batch - $start).TotalMilliseconds)ms")
        $start = $batch
    }

    $parts = $line.split(' ')
    $out = "$file,$($parts[0]) $($parts[1]),$($parts[2]),$($parts[3]),$($parts[4]),$($parts[5]),$($parts[6]),$($parts[7])"


    ## Send the output out - comment in/out the desired output method
    ## Direct output - roughly 10,000 lines / 880ms
    $out

    ## Via write-output - roughly 10,000 lines / 1500ms
    write-output $out
}
$reader.Close()

Invoked as .\script.ps1 {path_to_340,000_line_IIS_log} > bob.txt; progress/performance timings are given on stderr.

The script above has two output lines - the write-output one is reporting 10,000 lines every 1500ms, whereas the line that does not have write-output takes as good as half that, averaging about 880ms per 10,000 lines.

I thought that an object defaulted to write-output if it had no other thing (i.e., I thought that "bob" was equivalent to write-output "bob"), but the times I'm getting argue against this.

What am I missing here?

Upvotes: 2

Views: 1296

Answers (1)

mjolinor
mjolinor

Reputation: 68273

Just a guess, but:

Looking at the help on write-output

Write-Output [-InputObject] <PSObject[]> [-NoEnumerate]  [<CommonParameters>]

You're giving it an list of objects as an argument, so it's having to spend a little time assembling to them into an array internally before it does the write, whereas simply outputting them just streams them to the pipeline immediately. You could pipe them to Write-Object, but that's going to add another pipeline which might be even worse.

Edit
In addition you'll find that it's adding .062ms per operation (1500 -880)/10000. You have to scale that to very large data sets before it becomes noticeable.

Upvotes: 1

Related Questions