Reputation: 23798
My intent was to give an advise on the question Delete everything between two strings (inclusive) to use the HTMLDocument parser instead of a text based replace
command.
But somehow the OuterHTML
property of the <aside>
element doesn't include the concerned element up and till the </aside>
end tag:
$Html = @'
<html>
<head>
<title>Title</title>
</head>
<body>
<h1>Some header elements</h1>
<aside>
<p>huge text in between aside</p>
</aside>
<div>
<p>huge text in between div</p>
</div>
<p>Some other elements</p>
</body>
</html>
'@
function ParseHtml($String) {
$Unicode = [System.Text.Encoding]::Unicode.GetBytes($String)
$Html = New-Object -Com 'HTMLFile'
if ($Html.PSObject.Methods.Name -Contains 'IHTMLDocument2_Write') {
$Html.IHTMLDocument2_Write($Unicode)
}
else {
$Html.write($Unicode)
}
$Html.Close()
$Html
}
$Document = ParseHtml $Html
<aside>
$Document.getElementsByTagName('aside') | ForEach-Object { $_.OuterHTML }
<ASIDE>
<div>
$Document.getElementsByTagName('div') | ForEach-Object { $_.OuterHTML }
<DIV><P>huge text in between div</P></DIV>
<aside>
element that explains the difference to other elements as e.g. a <div>
?<aside>
element up and till the </aside>
end tag?Upvotes: 1
Views: 39