Callie J
Callie J

Reputation: 31336

Discrepency between getElementById and getElementsByName in Powershell

I'm having a problem with Powershell and MSHTML: getElementById and getElementsByTagName are giving me different results.

I have this HTML file:

<html>
    <head>
        <title>Moo</title>
    </head>
    <body>
        <h1>Hello world!</h1>
        <ul id="a">
            <li>Bob
            <li>Cat
            <li>Fish
        </ul>
        <ul id="b">
            <li>Bob
            <li>Dog
            <li>Cow
        </ul>
        <div id="x">
            <b>What's the problem? Don't you <i>like soup</b> like this?</i>.
        </div>
    </body>
</html>

Yes, it's purposely meant to be badly nested as I'm playing with ideas (and testing for limitations) at the moment.

I'm loading this file into Powershell with MSHTML by doing (from https://stackoverflow.com/a/24989452/130352 ):

$html = new-object -ComObject "HTMLFile"
$source = get-content -path $htmlFileName -raw
$html.IHtmlDocument2_write($source)

If I now do:

$d = $html.getElementById("x")
$d.TagName

I get the expected output (DIV). However if I do:

$d = $html.getElementsByTagName("div")
$d[0].TagName

I get no output. $d.length returns 1. $d[0] | ft spits out all the properties and values without a problem. But accessing properties from $d[0] directly doesn't return anything (i.e., asking for innerText, outerHtml, etc return no data, whilst finding the element by ID (i.e., with getElementById) has no problem and I can access the properties.

Similarly, if I assign to a variable (i.e., $z = $d[0]), and then work directly on $z, I get the same issue.

What am I missing?

Upvotes: 2

Views: 1646

Answers (1)

arco444
arco444

Reputation: 22881

Nice question.

Couldn't explain exactly what's happening, but it seems that

$d = $html.getElementsByTagName("div")

returns a collection that PowerShell doesn't know how to natively manipulate.

If you attempt to select some (or all) of the properties, then the variable does get fully populated:

($d | select *)[0].tagname

If using the getElementsByTagName method, you could do

$d = $html.getElementsByTagName("div") | select *
$d[0].tagname

It seems that piping the ComObject to the Select-Object causes it to be cast as a PowerShell PSCustomObject which you're then able to work with:

[PS]> ($html.getelementsbytagname("div") | select *).gettype()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    PSCustomObject                           System.Object


[PS]> ($html.getelementsbytagname("div")).gettype()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    __ComObject                              System.MarshalByRefObject

Not 100% sure this "answers" the question, as I can't fully explain the behaviour, but it's a viable workaround at the least

Upvotes: 2

Related Questions