How to work with xpath on strings containing html in Powershell?

Question

I want to extract values from a html document and in another program (ui.vision / selenium) I can do it with xpath statements. I have worked out a whole lot of working xpaths, and now I want to use them in Powershell. I have the string $html containing everything from to (incl.).

As far as I have researched, I need to have an xml object to use 'Select-Xml' with xpath statements.

In order to convert $html to xml I tried to cast:

[xml]$xml = $html

as well as

 $xml = [xml]$html

and I tried to convert:

$html = $html | ConvertTo-xml

All failed. I think that the html needs to be very well-formatted, but it is not (even if it's perfect html and passes the W3 validator without warnings). It's minified and most attributes lack parentheses.

So how can I get xpath to work on a string containing a html website? I am about to resort to regular expressions, but it seems to be a lot of work to translate all the xpath statements.

How to work with xpath on strings containing html in Powershell?

Answers (1)

Related Questions