Reputation: 123
I have the below monitoring link output which i am trying parse to variable.
<html>
<head>
<style type="text/css"></style>
</head>
<body>
<div style="float:left;margin-right:50px">
<div>DATA CENTERS WITH GLOBAL REPLICATION TIER ENABLED/SUSPENDED:
<div><br><br> DataCenter: DC1 NY [ENABLED]
<div><br> Active Zone : BW Zone 1[1], VIP = 192.168.254.10</div>
<div><br> <a href=https://192.168.254.10/checkGlobalReplicationTier>https://192.168.254.10/checkGlobalReplicationTier</a>
[ACTIVE]</div>
<div> <a href=https://192.168.254.10/checkReplication>https://192.168.254.10/checkReplication</a></div>
<div><br> <a href=https://192.168.254.11/checkGlobalReplicationTier>https://192.168.254.11/checkGlobalReplicationTier</a>
[STANDBY]</div>
<div> <a href=https://192.168.254.11/checkReplication>https://192.168.254.11/checkReplication</a></div>
<div><br> Local Zones:</div>
<div> LC Zone 3[3], VIP = 192.168.254.13
<div> <a href=https://192.168.254.13/checkReplication>https://192.168.254.13/checkReplication</a>
[ACTIVE]</div>
<div><br><br> DataCenter: DC2 NJ [ENABLED]
[DEFAULT DC]</div>
<div><br> Active Portal Zone : BW Zone 2[2], VIP = 192.168.253.10</div>
<div><br> <a href=https://192.168.253.10/checkGlobalReplicationTier>https://192.168.253.10/checkGlobalReplicationTier</a>
[ACTIVE]</div>
<div> <a href=https://192.168.253.10/checkReplication>https://192.168.253.10/checkReplication</a></div>
<div><br> <a href=https://192.168.253.11/checkGlobalReplicationTier>https://192.168.253.11/checkGlobalReplicationTier</a>
[STANDBY]</div>
<div> <a href=https://192.168.253.11/checkReplication>https://192.168.253.11/checkReplication</a></div>
<div><br> Local Zones:</div>
<div> LC Zone 4[4], VIP = 192.168.253.13
<div> <a href=https://192.168.253.13/checkReplication>https://192.168.253.13/checkReplication</a>
[ACTIVE]</div>
<div> <a href=https://192.168.253.14/checkReplication>https://192.168.253.14/checkReplication</a>
[STANDBY]</div>
--> </div>
</div>
</body>
</html>
i would like to parse this to get
Data Center Active Zone VIP Local Zone VIP
DC1 NY [Enabled] BW Zone 1[1] 192.168.254.10 LC Zone 3[3] 192.168.254.13
DC2 NJ [Enabled] [DEFAULT DC] BW Zone 2[2] 192.168.253.10 LC Zone 4[4] 192.168.253.13
The code seems to be not able to parse and is Regex is the best way to parse this page or should i try some other way.
$zone = "https://192.168.0.90/checkConfiguration"
$html = Invoke-WebRequest -Uri $zone -ErrorAction Stop
$DC = ($html.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div><br><br> DataCenter: *' }) | Foreach-Object {$_.outerText -replace '(?<!:.*):', '='} | %{new-object psobject -prop (ConvertFrom-StringData $_)}
Upvotes: 0
Views: 495
Reputation: 61028
For that you could do this:
$div = $html.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.InnerHTML -like '<div>*DataCenter:*' }
$DC = if ($div -and $div.outerText -match '(?s)DataCenter\s*:\s*(\w+).*Active Zone\s*:\s*([^,]+),\s+VIP\s*=\s*([\d\.]+)') {
[PsCustomObject]@{
'DataCenter' = $matches[1]
'Active Zone' = $matches[2]
'VIP' = $matches[3]
}
}
$DC | Format-Table -AutoSize
Output:
DataCenter Active Zone VIP
---------- ----------- ---
DC1 BW Zone 192.168.0.95
or as List
$DC | Format-List
Output:
DataCenter : DC1
Active Zone : BW Zone
VIP : 192.168.0.95
Here's a different approach when multiple datacenters are in the html file:
# use outerText to get the plain text for the surrounding <div>DATA CENTERS WITH GLOBAL REPLICATION TIER ENABLED/SUSPENDED ...</div>
$content = ($html.ParsedHtml.getElementsByTagName('div') | Where-Object { $_.innerHtml -like '<div>DATA CENTERS*' }).outerText
$DC = $content -split 'DataCenter\s*:\s*' |
Where-Object { $_ -match '(?s)([\w ]+(?:[ [\w\]]*)).*Active (?:Portal )?Zone\s*:\s*([^,]+),\s+VIP\s*=\s*([\d.]+)' } |
ForEach-Object {
[PsCustomObject]@{
'DataCenter' = $matches[1]
'Active Zone' = $matches[2]
'VIP' = $matches[3]
}
}
$DC | Format-Table -AutoSize
Output:
DataCenter Active Zone VIP
---------- ----------- ---
DC1 NY [ENABLED] BW Zone 1[1] 192.168.254.10
DC2 NJ [ENABLED] [DEFAULT DC] BW Zone 2[2] 192.168.253.10
Regex details:
(?s) Match the remainder of the regex with the options: dot matches newline (s)
( Match the regular expression below and capture its match into backreference number 1
[\w ] Match a single character present in the list below
A word character (letters, digits, etc.)
The character “ ”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?: Match the regular expression below
[ [\w\]] Match a single character present in the list below
One of the characters “ [”
A word character (letters, digits, etc.)
A ] character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
)
. Match any single character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Active\ Match the characters “Active ” literally
(?: Match the regular expression below
Portal\ Match the characters “Portal ” literally
)? Between zero and one times, as many times as possible, giving back as needed (greedy)
Zone Match the characters “Zone” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
: Match the character “:” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 2
[^,] Match any character that is NOT a “,”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
, Match the character “,” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
VIP Match the characters “VIP” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
= Match the character “=” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 3
[\d.] Match a single character present in the list below
A single digit 0..9
The character “.”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
Upvotes: 1