Xehanort
Xehanort

Reputation: 3

Automating Download of Site Dynamically generated http file

I am trying to automate the download of report http://www.wesm.ph. However, the page generates the file on demand and the download URL will generate something like this:

http://www.wesm.ph/download.php?download=TUJBT1JURF8yMDE3LTA3LTI2XzIwMTctMDctMjZfR19MVVpPTi5jc3Y=

Is it possible to automate this? Thank you.

Upvotes: 0

Views: 499

Answers (1)

user2674513
user2674513

Reputation:

This script will grab the latest file.

Set $folderPath to the folder where the CSV's are saved.

$folderPath = 'C:\Users\Michael\Downloads'

# If I can't download something from the last two weeks 
# then use this flag to track the error. 
$downloadSucceeded = $false

function giveBinaryEqualFile ([string] $myInput, [string] $fileName)
{
    $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
    [System.IO.File]::WriteAllText($fileName, $myInput, $Utf8NoBomEncoding)
}

function generateAddress ([DateTime] $myDate)
{
    $localTimeZone = [System.TimeZoneInfo]::Local
    $PhilippineTimeZone = [System.TimeZoneInfo]::FindSystemTimeZoneById("China Standard Time")

    $PhilippineNow = [System.TimeZoneInfo]::ConvertTime($myDate, $localTimeZone, $PhilippineTimeZone)

    # Address
    $address = "http://www.wesm.ph/download.php?download="

    # What is the file name? 
    $dateInName = Get-Date -Date $PhilippineNow -Format 'yyyy-MM-dd'
    $nameInURL = "MBAORTD_{0}_{0}_G_LUZON.csv" -f $dateInName
    $fileName  =     "RTD_{0}_{0}_G_LUZON.csv" -f $dateInName

    # Base64 Encode
    $byteArray = [System.Text.Encoding]::UTF8.GetBytes($nameInURL)
    $encodedFileName = [System.Convert]::ToBase64String($byteArray)

    # URL
    $url = $address + $encodedFileName

    # Object 
    $properties = @{
        'address'  = $url
        'fileName' = $fileName
    }

    New-Object PSObject -Property $properties
}


# Try to download the latest file. 
# Search the last two weeks. 
:latest for($i=0; $i -ge -14; $i--)
{
    $localNow = (Get-Date).AddDays($i)
    $name = generateAddress $localNow
    $myRequest = Invoke-WebRequest -Uri $name.address

    # Skip this URL if the file length is zero. 
    if ($myRequest.RawContentLength -eq 0)
    {  continue latest  }

    # Skip this URL if we get the 404 page. 
    foreach ($element in $myRequest.AllElements) 
    {   
        if ($element.class -eq 'error')
        {  continue latest  }
    }

    # We did not see an error message. 
    # We must have the file. 

    # Save the file. 
    $myPath = Join-Path $folderPath ($name.fileName)
    if (Test-Path -Path $myPath )
    {
        Write-Host "$($name.fileName) already exists. Exiting. "
        exit
    }
    else
    {  giveBinaryEqualFile ($myRequest.Content) $myPath  }

    # Record success. 
    $downloadSucceeded = $true

    # Leave the loop. 
    break latest
}

if ($downloadSucceeded)
{
    Write-Host "The download succeeded."
    Write-Host "File Name: $($name.fileName)"
}
else
{
    Write-Host "The download failed."
    Write-Host "No files available from the last two weeks. "
}

Downloading the file using a Web browser and downloading using a HtmlWebResponseObject produce different files. The content is the same. But the encoding differs. And PowerShell formatters add a newline. So I Removed the BOM and newline. And you can reuse my giveBinaryEqualFile() function to fix formatting problems in other scripts.

Make sure we're using the Philippine time zone. Another example.

Encode the URL per Obsidian Age's comment.

And use labels to break out of loops early.

Upvotes: 1

Related Questions