iRon
iRon

Reputation: 23798

Why isn't the end tag not included in an ASIDE.OuterHTML

My intent was to give an advise on the question Delete everything between two strings (inclusive) to use the HTMLDocument parser instead of a text based replace command.
But somehow the OuterHTML property of the <aside> element doesn't include the concerned element up and till the </aside> end tag:

html

$Html = @'
<html>
    <head>
        <title>Title</title>
    </head>
    <body>
        <h1>Some header elements</h1>
        <aside>
            <p>huge text in between aside</p>
        </aside>
        <div>
            <p>huge text in between div</p>
        </div>
        <p>Some other elements</p>
    </body>
</html>
'@

Parsing

function ParseHtml($String) {
    $Unicode = [System.Text.Encoding]::Unicode.GetBytes($String)
    $Html = New-Object -Com 'HTMLFile'
    if ($Html.PSObject.Methods.Name -Contains 'IHTMLDocument2_Write') {
        $Html.IHTMLDocument2_Write($Unicode)
    } 
    else {
        $Html.write($Unicode)
    }
    $Html.Close()
    $Html
}
$Document = ParseHtml $Html

<aside>

$Document.getElementsByTagName('aside') | ForEach-Object { $_.OuterHTML }
<ASIDE>

<div>

$Document.getElementsByTagName('div') | ForEach-Object { $_.OuterHTML }

<DIV><P>huge text in between div</P></DIV>

Upvotes: 1

Views: 39

Answers (0)

Related Questions