noc_coder
noc_coder

Reputation: 349

Parsing DOM using XML (VBA)

I am trying to write code that will sift through this DOM structure:

<html>
 <head>
  <body>
   <table id="the-table" border="1">
    <thead>
    <tbody>
     <tr> </tr>
     <tr>
      <td class="x-grid3-hd-inner" bgcolor="#8dd5e7" colspan="7">
     </tr>
     <tr>
      <td class="x-grid3-hd-inner" bgcolor="#8dd5e7" colspan="7">
     </tr>
     <tr>
     <tr>
      <td class="oneline">2</td>
      <td class="oneline">ENB</td>
      <td class="oneline">2</td>
      <td class="oneline">CELL_99</td>
      <td class="oneline">255.255.255.0</td>
      <td class="oneline">My Group</td>
      <td class="oneline">*</td>
     </tr>
     <tr>
     <tr>
     <tr>
     ...
     <tr>
     <tr>
   </tbody>
  </table>
 </body>
</html>

I am trying to extract the text at each td element for all tr elements of the table. I expanded one example out above. All td elements of the table are formatted using the same html structure (besides the title of the table). This is the method that I have used so far.

Sub ParseWebPage(url As String, sheet As String, searchCrit As String)
    Dim objXML As MSXML2.DOMDocument
    Set objXML = New MSXML2.DOMDocument
    Set htm = CreateObject("htmlFile")
    With CreateObject("msxml2.xmlhttp")
      .Open "GET", url, False
      .send
      xmlresp = .responseText
    End With
    objXML.loadXML (xmlresp)
    Dim objElem As MSXML2.IXMLDOMElement
    Debug.Print xmlresp

    objXML.loadXML (xmlresp)
    Set objElem = objXML.selectSingleNode("tr")
    Debug.Print "Found" & objElem.text
End Sub

The problem is, every time my objElem returns back empty. I also tried using a NodeList instead of IXMLDOMElement but it always returned empty.

I believe the issue to be the string argument. I have tried using "tr", "oneline", "/html/body/table/tbody", and creating a loop for each "/html/body/table/tbody/tr[x]/td[y]" but none of these were effective.

Can someone help me out here?

Upvotes: 0

Views: 861

Answers (3)

Cookie Monster
Cookie Monster

Reputation: 33

Sub test()
  Dim objList As MSXML2.IXMLDOMNodeList
  Dim objxml As New MSXML2.DOMDocument
  Dim i As Integer

  objxml.Load ("C:\test.xml") 'used load, loadXML would be correct for your use
  Set objList = objxml.SelectNodes("//tr/td")
  For i = 0 To objList.Length - 1
      Debug.Print objList.Item(i).Text
  Next i
End Sub

Used the above code on the following:

<html>
 <head>
  <body>
   <table id="the-table" border="1">
    <thead>
     <tbody>
      <tr>
      <td class="oneline">2</td>
      <td class="oneline">ENB</td>
      <td class="oneline">2</td>
      <td class="oneline">CELL_99</td>
      <td class="oneline">255.255.255.0</td>
      <td class="oneline">My Group</td>
      <td class="oneline">*</td>
     </tr>
   </tbody>
   </thead>
  </table>
 </body>
 </head>
</html>

Had the input file saved as a .xml. I got the desired results from this. This lets me believe that one of the following are happening:

  1. Your earlier xmlresp is not well formed. Can you check it or export objxml to see if it formed correctly?
  2. Your input string is too large for vba's msxml2. I once had this happen to me where xfdf data from adobe exceeded some maximum string length which lead to the input not being formed properly. When I ran a XSL outside of vba or removed the field with the long strings, it worked.
  3. There is a difference between how XML and HTML are treated. I am not that familiar with HTML, so can't comment on that part of your code too much

Upvotes: 1

noc_coder
noc_coder

Reputation: 349

Sub ParseWebPage(url As String, sheet As String, searchCrit As String)
 Dim objXML As MSXML2.DOMDocument
 Set objXML = New MSXML2.DOMDocument
 Set htm = CreateObject("htmlFile")
 Dim tableData() As String
 Dim openPos, closePos As Integer
 Dim midPart As String

 With CreateObject("msxml2.xmlhttp")
    .Open "GET", url, False
    .send
    xmlresp = .responseText
 End With
 objXML.loadXML (xmlresp)
 tableData = Split(xmlresp, searchCrit)

 For i = 12 To UBound(tableData) - 1
    openPos = InStr(tableData(i), Chr(34) & ">")
    closePos = InStr(tableData(i), "</td>")
    midPart = mid(tableData(i), openPos + 2, closePos - openPos - 2)
    Debug.Print midPart
 Next i
End Sub

This is an ok solution for now. But, I will continue to do more research on the topic. I ended up not using the XML libraries at all.

Upvotes: 0

Cookie Monster
Cookie Monster

Reputation: 33

If you are trying to get all of the texts ("2", "ENB" etc.) in the elements of td, try the following:

Dim objList As MSXML2.IXMLDOMNodeList
Set objList = objXML.SelectNodes("//tr/td")
For i = 0 To objList.Length - 1
   Debug.Print objList.Item(i)
Next i

Hope that works. Didn't have time to test, but copied from similar working code I have.

Upvotes: 0

Related Questions