hurlenko
hurlenko

Reputation: 1425

High memory consumption when returning a byte array from function

I'm trying to download a 10MB file and store it as an array for further processing.

Everything seems fine when using a direct call to (New-Object System.Net.WebClient).DownloadData("<url>"). But if I wrap it inside a function and return the result of the call to WebClient::DownloadData memory footprint increases to around 500mb.

The function that I use:

function My-Download {
    param (
        [Parameter(Mandatory = $True, Position = 1)] [String] $UrlCode
    )
    (New-Object System.Net.WebClient).DownloadData($UrlCode)
}
$x = My-Download("https://file-examples.com/wp-content/uploads/2017/04/file_example_MP4_1280_10MG.mp4")

The reason I wrapped it inside of the function is that I also do additional processing on the data before returning it but even this small example illustrates the problem.

Calling $x = (New-Object System.Net.WebClient).DownloadData("https://file-examples.com/wp-content/uploads/2017/04/file_example_MP4_1280_10MG.mp4") results in 83MB:

direct call memory consumption

Calling the above function results in 500MB:

wrapper function memory consumption

What is the reason for such a high memory usage and what can I do to optimize it?

Powershell version:

Major  Minor  Build  Revision
-----  -----  -----  --------
5      1      17134  407

Upvotes: 4

Views: 411

Answers (1)

mklement0
mklement0

Reputation: 439193

The [System.Net.WebClient] type's .DownloadData() method returns a byte array ([byte[]]).

  • If you assign the output from a call to that method to a variable directly, the variable receives that byte array as-is.

  • By contrast, if a call to that method is used to produce implicit output from a function, the [byte[]] array's elements are sent to the pipeline, one by one (byte by byte).
    The design intent behind the pipeline is to enable streaming, object-by-object processing rather than collect-all-result-first behavior, which trades execution speed for memory-throttling, one-by-one, as-output-becomes-available processing.

Assigning the function's output to a variable then causes PowerShell to implicitly collect the individual output objects (bytes in this case) in a regular [object[]] array.

In other words: the original [byte[]] array was first enumerated, only to be collected later in another array, albeit an [object[]]-typed one - that is obviously unnecessary and inefficient in your scenario.

There are two ways to opt out of this implicit enumeration:

  • Instead of implicit output, you can use a conceptually explicit Write-Output -NoEnumerate call in order to suppress the enumeration of an output array (collection):

    Write-Output -NoEnumerate (New-Object System.Net.WebClient).DownloadData($UrlCode)
    
  • A more obscure, but more concise and faster alternative is to combine implicit output with an auxiliary single-element wrapper array, which causes PowerShell to enumerate the wrapper array only, passing the wrapped array through, as PetSerAl suggests in a comment on the question:

    , (New-Object System.Net.WebClient).DownloadData($UrlCode)`
    
    • , is PowerShell's array-construction operator (the "comma operator"), and in its unary form it wraps the RHS in a single-element array (of type [object[]]).

Upvotes: 2

Related Questions