edwio
edwio

Reputation: 333

Is PowerShell's Invoke-WebRequest scalable?

I have a requirement to check the availability of 1,000 different urls, from a given text file, via a single windows server 2016 virtual machine, with PowerShell v5.1 installed.

The required check should be in interval of every 5 minutes.

My first assumption was to use PowerShell cmdlets: Get-Content with Invoke-WebRequest in a For-Each loop:

$urlList = Get-Content -Path "c:\URLsList.txt"

foreach ($url in $urlList) {

    $result = Invoke-WebRequest $url
    $result.StatusCode
}

But given the number of URLs (1,000), i'm not sure is PowerShell Invoke-WebRequest is scalable enough for this task.

I didn't see any mentioning of best practice or any limitation in the official documentation: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-webrequest?view=powershell-5.1

But while searching, I have learned about PowerShell jobs.

But due to the fact, That the 1,000 URLs, should be checked every 5 minutes.

I'm not sure if it will be relevant.

Upvotes: 1

Views: 1179

Answers (1)

mklement0
mklement0

Reputation: 437100

An inherent limitation of Invoke-WebRequest (and Invoke-RestMethod) is being able to act on just one URL at a time.

Targeting multiple URLs in parallel requires command-external parallelism, such as via (slow and resource-intensive) PowerShell jobs or (lightweight and therefore preferable) thread jobs - via the ThreadJob module that can be installed on demand in Windows PowerShell and comes with PowerShell (Core) 7+, and, most efficiently, with the equally thread-based ForEach-Object -Parallel feature in PowerShell 7+.

However, command-external parallelism, even in the thread-based form, invariably entails nontrivial overhead.


Therefore, consider using curl.exe, which ships with recent Windows versions and has built-in support for targeting multiple URLs as well as doing so in parallel.


Performance comparison based on sample code that performs GET requests with 10 URLs and reports the responses' HTTP status codes, using a variety of sequential and parallel approaches.

  • Absolute timings will vary, even between runs, but the ratio should provide a sense of what performs best.

  • The ranking may be different on Unix-like platforms. Curiously, on an M1 Mac I see the operations being slower overall, and curl even being slower than the comparable Invoke-WebRequest approaches.

  • The source code is below; you can easily tweak it to provide more URLs and experiment with the degree of parallelism.

Sample results from Windows PowerShell (values in seconds, fastest first):

Method                                       Duration
------                                       --------
curl, parallel                              0.2868489
Invoke-WebRequest, Start-ThreadJob          0.5779788
curl, sequential                            1.9407611
Invoke-WebRequest, sequential               2.3540807
Invoke-WebRequest, ForEach-Object -Parallel       N/A

Sample results from PowerShell (Core) 7.3.4 (on Windows):

Method                                      Duration
------                                      --------
curl, parallel                                  0.27
Invoke-WebRequest, ForEach-Object -Parallel     0.42
Invoke-WebRequest, Start-ThreadJob              0.52
curl, sequential                                1.89
Invoke-WebRequest, sequential                   2.05

Source code:

# Sample URLs
$urls = @(
  'http://www.example.org'
  'http://www.example.com'
  'https://en.wikipedia.org'
  'https://de.wikipedia.org'
  'https://fr.wikipedia.org'
  'https://it.wikipedia.org'
  'https://es.wikipedia.org'
  'https://ru.wikipedia.org'
  'https://ru.wikipedia.org'
  'https://als.wikipedia.org'
)

# Code that implements various approaches.
$scriptBlock = {
  param(
    [switch] $UseCurl,
    [switch] $Parallel,
    [switch] $UseThreadJobs
  )
    
  if ($useCurl) {
    # use curl.exe
    $curlExe = if ($IsCoreCLR) { 'curl' } else { 'curl.exe' }
    $urlArgs = foreach ($url in $urls) { $url, '-o', '/dev/null' }
    $parallelArgs = @()
    if ($Parallel) { $parallelArgs = '--parallel', '--parallel-max', $numParallelTransfers }
    & $curlExe -s -w '%{url} = %{http_code}\n' $parallelArgs -L $urlArgs
  }
  else {
    # use Invoke-WebRequest
    $ProgressPreference = 'SilentlyContinue'
    if ($Parallel) {
      if ($UseThreadJobs) {
        $urls | ForEach-Object {
          Start-ThreadJob -ThrottleLimit $numThreads { "$using:_ = " + (Invoke-WebRequest $using:_).StatusCode } 
        } | Receive-Job -Wait -AutoRemoveJob
      }
      else { # ForEach-Object -Parallel
        $urls | ForEach-Object -ThrottleLimit $numThreads -Parallel { "$_ = " + (Invoke-WebRequest $_).StatusCode }
      }
    }
    else { # sequential
      $urls | ForEach-Object {  "$_ = " + (Invoke-WebRequest $_).StatusCode }
    }
  }
} 

# Set the desired number of parallel threads / transfers:
$numThreads = 10 # for ForEach-Object -Parallel, whose default is 5
$numParallelTransfers = 50 # For curl.exe: 50 is the default, and lowering it hurts performance

# Run benchmarks
@(
  [pscustomobject] @{
    Method   = 'Invoke-WebRequest, sequential'
    Duration = (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest sequential solution:'; & $scriptBlock | Out-Host }).TotalSeconds
  }

  [pscustomobject] @{
    Method   = 'Invoke-WebRequest, ForEach-Object -Parallel'
    Duration =
      if ($PSVersionTable.PSVersion.Major -lt 7) { 'N/A' }
      else { (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest with ForEach-Object -Parallel:'; & $scriptBlock -Parallel | Out-Host }).TotalSeconds }
  }

  [pscustomobject] @{
    Method   = 'Invoke-WebRequest, Start-ThreadJob'
    Duration =
      if (-not (Get-Command -ErrorAction Ignore Start-ThreadJob)) { 'N/A' }
      else { (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest with Start-ThreadJob:'; & $scriptBlock -Parallel -UseThreadJobs | Out-Host }).TotalSeconds }
  }

  [pscustomobject] @{
    Method   = 'curl, sequential'
    Duration = (Measure-Command { Write-Verbose -Verbose 'curl.exe sequential solution:'; & $scriptBlock -UseCurl | Out-Host }).TotalSeconds
  }

  [pscustomobject] @{
    Method   = 'curl, parallel'
    Duration = (Measure-Command { Write-Verbose -Verbose 'curl.exe parallel solution:'; & $scriptBlock -UseCurl -Parallel | Out-Host }).TotalSeconds
  } 

) | 
  ForEach-Object -Begin {
    Write-Verbose -Verbose "Timing in seconds for $($urls.Count) URLs, based on $numThreads simultaneous threads running Invoke-WebRequest / up to $numParallelTransfers parallel curl.exe transfers:"
  } -Process {
    $_
  } |
  Sort-Object Duration

Upvotes: 1

Related Questions