Reputation: 31336
I'm having a problem with Powershell and MSHTML: getElementById and getElementsByTagName are giving me different results.
I have this HTML file:
<html>
<head>
<title>Moo</title>
</head>
<body>
<h1>Hello world!</h1>
<ul id="a">
<li>Bob
<li>Cat
<li>Fish
</ul>
<ul id="b">
<li>Bob
<li>Dog
<li>Cow
</ul>
<div id="x">
<b>What's the problem? Don't you <i>like soup</b> like this?</i>.
</div>
</body>
</html>
Yes, it's purposely meant to be badly nested as I'm playing with ideas (and testing for limitations) at the moment.
I'm loading this file into Powershell with MSHTML by doing (from https://stackoverflow.com/a/24989452/130352 ):
$html = new-object -ComObject "HTMLFile"
$source = get-content -path $htmlFileName -raw
$html.IHtmlDocument2_write($source)
If I now do:
$d = $html.getElementById("x")
$d.TagName
I get the expected output (DIV
). However if I do:
$d = $html.getElementsByTagName("div")
$d[0].TagName
I get no output. $d.length
returns 1. $d[0] | ft
spits out all the properties and values without a problem. But accessing properties from $d[0] directly doesn't return anything (i.e., asking for innerText, outerHtml, etc return no data, whilst finding the element by ID (i.e., with getElementById
) has no problem and I can access the properties.
Similarly, if I assign to a variable (i.e., $z = $d[0]
), and then work directly on $z, I get the same issue.
What am I missing?
Upvotes: 2
Views: 1646
Reputation: 22881
Nice question.
Couldn't explain exactly what's happening, but it seems that
$d = $html.getElementsByTagName("div")
returns a collection that PowerShell doesn't know how to natively manipulate.
If you attempt to select some (or all) of the properties, then the variable does get fully populated:
($d | select *)[0].tagname
If using the getElementsByTagName
method, you could do
$d = $html.getElementsByTagName("div") | select *
$d[0].tagname
It seems that piping the ComObject
to the Select-Object
causes it to be cast as a PowerShell PSCustomObject
which you're then able to work with:
[PS]> ($html.getelementsbytagname("div") | select *).gettype()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False PSCustomObject System.Object
[PS]> ($html.getelementsbytagname("div")).gettype()
IsPublic IsSerial Name BaseType
-------- -------- ---- --------
True False __ComObject System.MarshalByRefObject
Not 100% sure this "answers" the question, as I can't fully explain the behaviour, but it's a viable workaround at the least
Upvotes: 2