Tazugan
Tazugan

Reputation: 129

Using powershell to read html content

Sorry for limited knowledge with powershell. Here I try to read html content from a website, and output as csv file. Right now I can successful download whole html code with my powershell script:

$url = "http://cloudmonitor.ca.com/en/ping.php?vtt=1392966369&varghost=www.yahoo.com&vhost=_&vaction=ping&ping=start";
$Path = "$env:userprofile\Desktop\test.txt"

$ie = New-Object -com InternetExplorer.Application 
$ie.visible = $true
$ie.navigate($url)

while($ie.ReadyState -ne 4) { start-sleep -s 10 }

#$ie.Document.Body.InnerText | Out-File -FilePath $Path
$ie.Document.Body | Out-File -FilePath $Path
$ie.Quit()

Get html code, something like this:

  ........
  <tr class="light-grey-bg">
  <td class="right-dotted-border">Stockholm, Sweden (sesto01):</td>
  <td class="right-dotted-border"><span id="cp20">Okay</span>
  </td>
  <td class="right-dotted-border"><span id="minrtt20">21.8</span>
  </td>
  <td class="right-dotted-border"><span id="avgrtt20">21.8</span>
  </td>
  <td class="right-dotted-border"><span id="maxrtt20">21.9</span>
  </td>
  <td><span id="ip20">2a00:1288:f00e:1fe::3001</span>
  </td>
  </tr>
  ........

But what i really want is get the content and output to csv file like this:

Stockholm Sweden (sesto01),Okay,21.8,21.8,21.9,2a00:1288:f00e:1fe::3001
........

What command can help me achieve this task?

Upvotes: 4

Views: 13898

Answers (1)

JPBlanc
JPBlanc

Reputation: 72630

It was interresting for me too, thanks for the CA site. I wrote this on the corner of my desk, it needs improvments.

Here is a way using Html-Agility-Pack, in the following, I suppose that HtmlAgilityPack.dll is in Html-Agility-Pack directory of the directory script file.

# PingFromTheCloud.ps1

$url = "http://cloudmonitor.ca.com/en/ping.php?vtt=1392966369&varghost=www.silogix.fr&vhost=_&vaction=ping&ping=start";
$Path = "c:\temp\Pingtest.htm"

$ie = New-Object -com InternetExplorer.Application 
$ie.visible = $true
$ie.navigate($url)

while($ie.ReadyState -ne 4) { start-sleep -s 10 }

#$ie.Document.Body.InnerText | Out-File -FilePath $Path
$ie.Document.Body | Out-File -FilePath $Path
$ie.Quit()

Add-Type -Path "$(Split-Path -parent $PSCommandPath)\Html-Agility-Pack\HtmlAgilityPack.dll"


$webGraber = New-Object -TypeName HtmlAgilityPack.HtmlWeb
$webDoc = $webGraber.Load("c:\temp\Pingtest.htm")
$Thetable = $webDoc.DocumentNode.ChildNodes.Descendants('table') | where {$_.XPath -eq '/div[3]/div[1]/div[5]/table[1]/table[1]'}

$trDatas = $Thetable.ChildNodes.Elements("tr")

Remove-Item "c:\temp\Pingtest.csv"

foreach ($trData in $trDatas)
{
  $tdDatas = $trData.elements("td")
  $line = ""
  foreach ($tdData in $tdDatas)
  {
    $line = $line + $tdData.InnerText.Trim() + ','
  }
  $line.Remove($line.Length -1) | Out-File -FilePath "c:\temp\Pingtest.csv" -Append
}

Upvotes: 1

Related Questions