AlexSnake
AlexSnake

Reputation: 1

Get specific data from html using Powershell

I would like to automate a task from my work using MS Powershell. Please, see my code below that log in the website. This code is working fine.

$username = "usern"
$password = "pass"
$ie = New-Object -com InternetExplorer.Application
$ie.visible=$true
$ie.navigate("http://www.exemple.com")
while($ie.ReadyState -ne 4) {start-sleep -m 100}
$ie.document.IHTMLDocument3_getElementByID("textfield").value = $username
$ie.document.IHTMLDocument3_getElementByID("textfield2").value = $password
$ie.document.IHTMLDocument3_getElementByID("btnLogin").Click();

Now, in order to download the report I need to extract a number from the HTML body and insert it into a variable. The reason I'm doing that is because this number changes every time I access the page. Please, see the following image, where the number is located inside the HTML Body of the webpage. It's always 12 digits:

image(click here) This is my problem. I cannot get this number inside a variable. If I could, then I would finalize the Powershell code with the script below.

$output = "C:\Users\AlexSnake\Desktop\WeeklyReport\ReportName.pdf"
Invoke-WebRequest -Uri http://www.exemple.com.br/pdf_pub/xxxxxxxxxxxx.pdf -OutFile $output

Where you see 'xxx..' I would replace for the variable and download the report

Upvotes: 0

Views: 1682

Answers (2)

AlexSnake
AlexSnake

Reputation: 1

See the code below with the answer for my question.

$($ie.Document.getElementsByTagName("a")).href | ForEach {

if( $_ -match '(\d+)\.pdf' )
{
    $matches[1]
   }
}

Thanks!

Upvotes: 0

kuraara
kuraara

Reputation: 111

After this bit of your code while($ie.ReadyState -ne 4) {start-sleep -m 100}

Try this:

$($ie.Document.getElementsByTagName("a")).href | ForEach {
    # The next line isn't necessary, but just to demonstrate iterating through all the anchor tags in the page (feel free to comment it out)

    Write-Host "This is the href tag that I'm enumerating through: $_"

    # And this bit checks for that number you're looking for and returns it:
    if( $_ -match "javascript:openwindow('/\.\./\.\./[\d+]\.pdf'.*)" )
    {
        $matches[1]
    }
}

This should work.

Upvotes: 1

Related Questions