user1126270
user1126270

Reputation: 25

How can I extract a value from an html page in vbscript - I tried MSXML2.DOMDocument

Below is some code I tried to get the value from a node in webpage. But it fails when trying to set the objNode... any help gratefully appreciated.

Dim objHttp, sWebPage, objNode, objDoc

Set objDoc = CreateObject("MSXML2.DOMDocument")
objDoc.Load "http://www.hl.co.uk/shares/shares-search-results/a/aveva-group-plc-ordinary-3.555p"

' objDoc.setProperty "SelectionLanguage", "XPath"

' Find a particular element using XPath:
Set objNode = objDoc.selectSingleNode("span[@id='ls-bid-AVV-L']")
MsgBox objNode.getAttribute("value")

Upvotes: 1

Views: 2905

Answers (2)

Ekkehard.Horner
Ekkehard.Horner

Reputation: 38745

  1. It's very optimistic to expect an XML parser to handle clean HTML; for flawed HTML, you can forget it (ref).
  2. You should never .load without checking for errors (see also). In your case, the .reason thrown is "The attribute 'property' on this element is not defined in the DTD/Schema."
  3. You can switch off the validation with objDoc.validateOnParse = False and avoid problems with monster pages with objDoc.async = False (at least no "msxml3.dll: The data necessary to complete this operation is not yet available." error).
  4. To search for a span anywhere (without knowing its place in the hierarchy) you need "//span[@id='ls-bid-AVV-L']" instead of "span[@id='ls-bid-AVV-L']".
  5. The span to find has no attribute named value; to get the "1,334.00p" you'd need to ask for objNode.text.
  6. But all this is to no avail: The page is not even well-formed. The .parseError.reason is "End tag 'div' does not match the start tag 'input'.".

Upvotes: 3

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200293

Use the Internet Explorer COM object:

url = "http://www.hl.co.uk/shares/shares-search-results/a/aveva-group-plc-ordinary-3.555p"

Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.Navigate url
While ie.ReadyState <> 4
  WScript.Sleep 100
Wend

MsgBox ie.document.getElementById("ls-bid-AVV-L").innerText

ie.Quit

Upvotes: 1

Related Questions