Reputation: 349
I am trying to write code that will sift through this DOM structure:
<html>
<head>
<body>
<table id="the-table" border="1">
<thead>
<tbody>
<tr> </tr>
<tr>
<td class="x-grid3-hd-inner" bgcolor="#8dd5e7" colspan="7">
</tr>
<tr>
<td class="x-grid3-hd-inner" bgcolor="#8dd5e7" colspan="7">
</tr>
<tr>
<tr>
<td class="oneline">2</td>
<td class="oneline">ENB</td>
<td class="oneline">2</td>
<td class="oneline">CELL_99</td>
<td class="oneline">255.255.255.0</td>
<td class="oneline">My Group</td>
<td class="oneline">*</td>
</tr>
<tr>
<tr>
<tr>
...
<tr>
<tr>
</tbody>
</table>
</body>
</html>
I am trying to extract the text at each td element for all tr elements of the table. I expanded one example out above. All td elements of the table are formatted using the same html structure (besides the title of the table). This is the method that I have used so far.
Sub ParseWebPage(url As String, sheet As String, searchCrit As String)
Dim objXML As MSXML2.DOMDocument
Set objXML = New MSXML2.DOMDocument
Set htm = CreateObject("htmlFile")
With CreateObject("msxml2.xmlhttp")
.Open "GET", url, False
.send
xmlresp = .responseText
End With
objXML.loadXML (xmlresp)
Dim objElem As MSXML2.IXMLDOMElement
Debug.Print xmlresp
objXML.loadXML (xmlresp)
Set objElem = objXML.selectSingleNode("tr")
Debug.Print "Found" & objElem.text
End Sub
The problem is, every time my objElem returns back empty. I also tried using a NodeList instead of IXMLDOMElement but it always returned empty.
I believe the issue to be the string argument. I have tried using "tr", "oneline", "/html/body/table/tbody", and creating a loop for each "/html/body/table/tbody/tr[x]/td[y]" but none of these were effective.
Can someone help me out here?
Upvotes: 0
Views: 861
Reputation: 33
Sub test()
Dim objList As MSXML2.IXMLDOMNodeList
Dim objxml As New MSXML2.DOMDocument
Dim i As Integer
objxml.Load ("C:\test.xml") 'used load, loadXML would be correct for your use
Set objList = objxml.SelectNodes("//tr/td")
For i = 0 To objList.Length - 1
Debug.Print objList.Item(i).Text
Next i
End Sub
Used the above code on the following:
<html>
<head>
<body>
<table id="the-table" border="1">
<thead>
<tbody>
<tr>
<td class="oneline">2</td>
<td class="oneline">ENB</td>
<td class="oneline">2</td>
<td class="oneline">CELL_99</td>
<td class="oneline">255.255.255.0</td>
<td class="oneline">My Group</td>
<td class="oneline">*</td>
</tr>
</tbody>
</thead>
</table>
</body>
</head>
</html>
Had the input file saved as a .xml. I got the desired results from this. This lets me believe that one of the following are happening:
Upvotes: 1
Reputation: 349
Sub ParseWebPage(url As String, sheet As String, searchCrit As String)
Dim objXML As MSXML2.DOMDocument
Set objXML = New MSXML2.DOMDocument
Set htm = CreateObject("htmlFile")
Dim tableData() As String
Dim openPos, closePos As Integer
Dim midPart As String
With CreateObject("msxml2.xmlhttp")
.Open "GET", url, False
.send
xmlresp = .responseText
End With
objXML.loadXML (xmlresp)
tableData = Split(xmlresp, searchCrit)
For i = 12 To UBound(tableData) - 1
openPos = InStr(tableData(i), Chr(34) & ">")
closePos = InStr(tableData(i), "</td>")
midPart = mid(tableData(i), openPos + 2, closePos - openPos - 2)
Debug.Print midPart
Next i
End Sub
This is an ok solution for now. But, I will continue to do more research on the topic. I ended up not using the XML libraries at all.
Upvotes: 0
Reputation: 33
If you are trying to get all of the texts ("2", "ENB" etc.) in the elements of td, try the following:
Dim objList As MSXML2.IXMLDOMNodeList
Set objList = objXML.SelectNodes("//tr/td")
For i = 0 To objList.Length - 1
Debug.Print objList.Item(i)
Next i
Hope that works. Didn't have time to test, but copied from similar working code I have.
Upvotes: 0