Reputation: 1425
I'm trying to download a 10MB file and store it as an array for further processing.
Everything seems fine when using a direct call to (New-Object System.Net.WebClient).DownloadData("<url>")
. But if I wrap it inside a function and return the result of the call to WebClient::DownloadData
memory footprint increases to around 500mb.
The function that I use:
function My-Download {
param (
[Parameter(Mandatory = $True, Position = 1)] [String] $UrlCode
)
(New-Object System.Net.WebClient).DownloadData($UrlCode)
}
$x = My-Download("https://file-examples.com/wp-content/uploads/2017/04/file_example_MP4_1280_10MG.mp4")
The reason I wrapped it inside of the function is that I also do additional processing on the data before returning it but even this small example illustrates the problem.
Calling $x = (New-Object System.Net.WebClient).DownloadData("https://file-examples.com/wp-content/uploads/2017/04/file_example_MP4_1280_10MG.mp4")
results in 83MB:
Calling the above function results in 500MB:
What is the reason for such a high memory usage and what can I do to optimize it?
Powershell version:
Major Minor Build Revision
----- ----- ----- --------
5 1 17134 407
Upvotes: 4
Views: 411
Reputation: 439193
The [System.Net.WebClient]
type's .DownloadData()
method returns a byte array ([byte[]]
).
If you assign the output from a call to that method to a variable directly, the variable receives that byte array as-is.
By contrast, if a call to that method is used to produce implicit output from a function, the [byte[]]
array's elements are sent to the pipeline, one by one (byte by byte).
The design intent behind the pipeline is to enable streaming, object-by-object processing rather than collect-all-result-first behavior, which trades execution speed for memory-throttling, one-by-one, as-output-becomes-available processing.
Assigning the function's output to a variable then causes PowerShell to implicitly collect the individual output objects (bytes in this case) in a regular [object[]]
array.
In other words: the original [byte[]]
array was first enumerated, only to be collected later in another array, albeit an [object[]]
-typed one - that is obviously unnecessary and inefficient in your scenario.
There are two ways to opt out of this implicit enumeration:
Instead of implicit output, you can use a conceptually explicit Write-Output -NoEnumerate
call in order to suppress the enumeration of an output array (collection):
Write-Output -NoEnumerate (New-Object System.Net.WebClient).DownloadData($UrlCode)
A more obscure, but more concise and faster alternative is to combine implicit output with an auxiliary single-element wrapper array, which causes PowerShell to enumerate the wrapper array only, passing the wrapped array through, as PetSerAl suggests in a comment on the question:
, (New-Object System.Net.WebClient).DownloadData($UrlCode)`
,
is PowerShell's array-construction operator (the "comma operator"), and in its unary form it wraps the RHS in a single-element array (of type [object[]]
).Upvotes: 2