Reputation: 83
I'm using an Invoke-WebRequest to grab user information from a table on a website. The information is coming back as such:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<table border='0' cellspacing='0' cellpadding='2' width='640'><tr><td class='col1bold' width='10'> </td><td class='col1reversebold' width='10'> </td><td class='col1reversebold' width='5'> </td><td class='col1reversebold' width='130'>F
ull Name</td><td class='col1reversebold' width='80'>Label</td><td class='col1reversebold' width='70'>Call Status</td><td class='col1reversebold' width='60'>DND/Hold</td><td class='col1reversebold' width='125'>Location</td><td class='col1reversebold'
width='80'>Current Site</td></tr><tr><td class='col1bold' width='30'> </td><td class='col1'><img src=/console/images/static/B3000NW.gif></td><td class='col1'> </td><td class='col1'>Testing, Steve</td><td class='col1'>OR-72</td><td class='
col1'>Inactive</td><td class='col1'> </td><td class='col1'>703a0e88c252</td><td class='col1'>Global</td></tr><tr><td class='col1bold' width='30'> </td><td class='col1'><img src=/console/images/static/B3000B.gif></td><td class='col1'>
</td><td class='col1'>GreenTest, Connie</td><td class='col1'>OR-24</td><td class='col1'>Inactive</td><td class='col1'> </td><td class='col1'>Left Nurse Station</td><td class='col1'>Global</td></tr></table>
If I use the following syntax:
$GroupRequests = Invoke-WebRequest -Uri "$URI" -WebSession $session -Method GET
$HTMLTable = $GroupRequests.ParsedHtml.body.InnerHTML
I get:
<TABLE cellSpacing=0 cellPadding=2 width=640 border=0>
<TBODY>
<TR>
<TD class=col1bold width=10> </TD>
<TD class=col1reversebold width=10> </TD>
<TD class=col1reversebold width=5> </TD>
<TD class=col1reversebold width=130>Full Name</TD>
<TD class=col1reversebold width=80>Label</TD>
<TD class=col1reversebold width=70>Call Status</TD>
<TD class=col1reversebold width=60>DND/Hold</TD>
<TD class=col1reversebold width=125>Location</TD>
<TD class=col1reversebold width=80>Current Site</TD></TR>
<TR>
<TD class=col1bold width=30> </TD>
<TD class=col1><IMG src="/console/images/static/B3000NW.gif"></TD>
<TD class=col1> </TD>
<TD class=col1>Testing, Steve</TD>
<TD class=col1>OR-72</TD>
<TD class=col1>Inactive</TD>
<TD class=col1> </TD>
<TD class=col1>703a0e987f72</TD>
<TD class=col1>Global</TD></TR>
<TR>
<TD class=col1bold width=30> </TD>
<TD class=col1><IMG src="/console/images/static/B3000B.gif"></TD>
<TD class=col1> </TD>
<TD class=col1>GreenTest, Connie</TD>
<TD class=col1>OR-24</TD>
<TD class=col1>Inactive</TD>
<TD class=col1> </TD>
<TD class=col1>Left Nurse Station</TD>
<TD class=col1>Global</TD></TR></TBODY></TABLE>
How do I basically parse each row (<TR>
) and then get the information into columns and headers? I've tried so many options with no luck.
Upvotes: 0
Views: 1509
Reputation: 61218
A bit strange your ParsedHtml seems to have an XML declaration, but you could try the function below:
function ConvertFrom-HtmlTable {
[CmdletBinding(DefaultParameterSetName = 'ByIndex')]
param(
[Parameter(Mandatory = $true, Position = 0)]
[Microsoft.PowerShell.Commands.HtmlWebResponseObject]$WebRequest,
[Parameter(Mandatory = $false, Position = 1, ParameterSetName = 'ByIndex')]
[int]$TableIndex = 0,
[Parameter(Mandatory = $false, Position = 1, ParameterSetName = 'ById')]
[string]$TableId,
[Parameter(Mandatory = $false, Position = 1, ParameterSetName = 'ByName')]
[string]$TableName,
[Parameter(Mandatory = $false, Position = 1, ParameterSetName = 'ByClass')]
[string]$TableClassName,
# if there are no <th> tags, by appending this switch, the code
# uses the first row in the table as column headers.
[switch]$FirstRowHasHeaders
)
# Extract the table out of the web request
switch ($PSCmdlet.ParameterSetName) {
'ById' { $table = $WebRequest.ParsedHtml.getElementByID($TableId) }
'ByIndex' { $table = @($WebRequest.ParsedHtml.getElementsByTagName('table'))[$TableIndex]}
'ByName' { $table = @($WebRequest.ParsedHtml.getElementsByName($TableName))[0] }
'ByClass' { $table = @($WebRequest.ParsedHtml.getElementsByClassName($TableClassName))[0] }
}
$WebRequest.Dispose()
$WebRequest.Finalize() # free the COM objects
if (!$table) {
Write-Warning "Could not find the given table."
return $null
}
# load the System.Web assembly to be able to decode HTML entities
Add-Type -AssemblyName System.Web
$headers = @()
# Go through all of the rows in the table
foreach ($row in $table.Rows) {
$cells = @($row.Cells)
# If there is a table header or if we should consider the first row
# to be the header row, remember its titles
if ($FirstRowHasHeaders -or $cells[0].tagName -eq "TH") {
$i = 0
$headers = @($cells | ForEach-Object {
$i++
# decode HTML entities and double-up quotes that the value may contain
$th = ([System.Web.HttpUtility]::HtmlDecode($_.InnerText) -replace '"', '""').Trim()
# if the table header is empty, create it
if ([string]::IsNullOrEmpty($th)) { "H$i" } else { $th }
})
# we now have headers, set the switch to $false so the rest is parsed as data
$FirstRowHasHeaders = $false
# proceed with the next row
continue
}
# if we haven't found any table headers, make up names "H1", "H2", etc.
if(-not $headers) {
$headers = @(1..($cells.Count + 2) | ForEach-Object { "H$_" })
}
# Now go through the cells in the the row. For each, try to find the
# title that represents that column and create a hashtable mapping those
# titles to content
$hash = [Ordered]@{}
for ($i = 0; $i -lt $cells.Count; $i++) {
# decode HTML entities and double-up quotes that the value may contain
$value = ([System.Web.HttpUtility]::HtmlDecode($cells[$i].InnerText) -replace '"', '""').Trim()
$th = $headers[$i]
$hash[$th] = $value.Trim()
}
# And finally cast that hashtable to a PSCustomObject
[PSCustomObject]$hash
}
}
and use it like this:
$GroupRequests = Invoke-WebRequest -Uri $URI -WebSession $session -Method GET
# the table doesn't have a name or id attribute, so use its index (guessing its the first table)
# in your example the table also does not have <th> header tags, so append -FirstRowHasHeaders
# if you want the first data row to act as column headers or accept that headers are created as
# 'H1', 'H2', 'H3' etc. Cells containing only whitespace will always be translated to 'H<columnNumber>'
$users = ConvertFrom-HtmlTable -WebRequest $GroupRequests -TableIndex 0 -FirstRowHasHeaders
# show on screen
$users | Format-Table -AutoSize
# save as CSV file
$users | Export-Csv -Path 'X:\Somewhere\users.csv' -NoTypeInformation
Upvotes: 1