Reputation: 333
I have a requirement to check the availability of 1,000 different urls, from a given text file, via a single windows server 2016 virtual machine, with PowerShell v5.1 installed.
The required check should be in interval of every 5 minutes.
My first assumption was to use PowerShell cmdlets: Get-Content with Invoke-WebRequest in a For-Each loop:
$urlList = Get-Content -Path "c:\URLsList.txt"
foreach ($url in $urlList) {
$result = Invoke-WebRequest $url
$result.StatusCode
}
But given the number of URLs (1,000), i'm not sure is PowerShell Invoke-WebRequest is scalable enough for this task.
I didn't see any mentioning of best practice or any limitation in the official documentation: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-webrequest?view=powershell-5.1
But while searching, I have learned about PowerShell jobs.
But due to the fact, That the 1,000 URLs, should be checked every 5 minutes.
I'm not sure if it will be relevant.
Upvotes: 1
Views: 1179
Reputation: 437100
An inherent limitation of Invoke-WebRequest
(and Invoke-RestMethod
) is being able to act on just one URL at a time.
Targeting multiple URLs in parallel requires command-external parallelism, such as via (slow and resource-intensive) PowerShell jobs or (lightweight and therefore preferable) thread jobs - via the ThreadJob
module that can be installed on demand in Windows PowerShell and comes with PowerShell (Core) 7+, and, most efficiently, with the equally thread-based ForEach-Object
-Parallel
feature in PowerShell 7+.
However, command-external parallelism, even in the thread-based form, invariably entails nontrivial overhead.
Therefore, consider using curl.exe
, which ships with recent Windows versions and has built-in support for targeting multiple URLs as well as doing so in parallel.
Performance comparison based on sample code that performs GET requests with 10 URLs and reports the responses' HTTP status codes, using a variety of sequential and parallel approaches.
Absolute timings will vary, even between runs, but the ratio should provide a sense of what performs best.
The ranking may be different on Unix-like platforms. Curiously, on an M1 Mac I see the operations being slower overall, and curl
even being slower than the comparable Invoke-WebRequest
approaches.
The source code is below; you can easily tweak it to provide more URLs and experiment with the degree of parallelism.
Sample results from Windows PowerShell (values in seconds, fastest first):
Method Duration
------ --------
curl, parallel 0.2868489
Invoke-WebRequest, Start-ThreadJob 0.5779788
curl, sequential 1.9407611
Invoke-WebRequest, sequential 2.3540807
Invoke-WebRequest, ForEach-Object -Parallel N/A
Sample results from PowerShell (Core) 7.3.4 (on Windows):
Method Duration
------ --------
curl, parallel 0.27
Invoke-WebRequest, ForEach-Object -Parallel 0.42
Invoke-WebRequest, Start-ThreadJob 0.52
curl, sequential 1.89
Invoke-WebRequest, sequential 2.05
Source code:
# Sample URLs
$urls = @(
'http://www.example.org'
'http://www.example.com'
'https://en.wikipedia.org'
'https://de.wikipedia.org'
'https://fr.wikipedia.org'
'https://it.wikipedia.org'
'https://es.wikipedia.org'
'https://ru.wikipedia.org'
'https://ru.wikipedia.org'
'https://als.wikipedia.org'
)
# Code that implements various approaches.
$scriptBlock = {
param(
[switch] $UseCurl,
[switch] $Parallel,
[switch] $UseThreadJobs
)
if ($useCurl) {
# use curl.exe
$curlExe = if ($IsCoreCLR) { 'curl' } else { 'curl.exe' }
$urlArgs = foreach ($url in $urls) { $url, '-o', '/dev/null' }
$parallelArgs = @()
if ($Parallel) { $parallelArgs = '--parallel', '--parallel-max', $numParallelTransfers }
& $curlExe -s -w '%{url} = %{http_code}\n' $parallelArgs -L $urlArgs
}
else {
# use Invoke-WebRequest
$ProgressPreference = 'SilentlyContinue'
if ($Parallel) {
if ($UseThreadJobs) {
$urls | ForEach-Object {
Start-ThreadJob -ThrottleLimit $numThreads { "$using:_ = " + (Invoke-WebRequest $using:_).StatusCode }
} | Receive-Job -Wait -AutoRemoveJob
}
else { # ForEach-Object -Parallel
$urls | ForEach-Object -ThrottleLimit $numThreads -Parallel { "$_ = " + (Invoke-WebRequest $_).StatusCode }
}
}
else { # sequential
$urls | ForEach-Object { "$_ = " + (Invoke-WebRequest $_).StatusCode }
}
}
}
# Set the desired number of parallel threads / transfers:
$numThreads = 10 # for ForEach-Object -Parallel, whose default is 5
$numParallelTransfers = 50 # For curl.exe: 50 is the default, and lowering it hurts performance
# Run benchmarks
@(
[pscustomobject] @{
Method = 'Invoke-WebRequest, sequential'
Duration = (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest sequential solution:'; & $scriptBlock | Out-Host }).TotalSeconds
}
[pscustomobject] @{
Method = 'Invoke-WebRequest, ForEach-Object -Parallel'
Duration =
if ($PSVersionTable.PSVersion.Major -lt 7) { 'N/A' }
else { (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest with ForEach-Object -Parallel:'; & $scriptBlock -Parallel | Out-Host }).TotalSeconds }
}
[pscustomobject] @{
Method = 'Invoke-WebRequest, Start-ThreadJob'
Duration =
if (-not (Get-Command -ErrorAction Ignore Start-ThreadJob)) { 'N/A' }
else { (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest with Start-ThreadJob:'; & $scriptBlock -Parallel -UseThreadJobs | Out-Host }).TotalSeconds }
}
[pscustomobject] @{
Method = 'curl, sequential'
Duration = (Measure-Command { Write-Verbose -Verbose 'curl.exe sequential solution:'; & $scriptBlock -UseCurl | Out-Host }).TotalSeconds
}
[pscustomobject] @{
Method = 'curl, parallel'
Duration = (Measure-Command { Write-Verbose -Verbose 'curl.exe parallel solution:'; & $scriptBlock -UseCurl -Parallel | Out-Host }).TotalSeconds
}
) |
ForEach-Object -Begin {
Write-Verbose -Verbose "Timing in seconds for $($urls.Count) URLs, based on $numThreads simultaneous threads running Invoke-WebRequest / up to $numParallelTransfers parallel curl.exe transfers:"
} -Process {
$_
} |
Sort-Object Duration
Upvotes: 1