ShanayL
ShanayL

Reputation: 1247

Retreiving <TD> tag from table on website using VBA and put into excel

I am trying to retrieve information from a <TD> tag on a website.

It works but I cant seem to get the text from the second <td> tag in a <TR> tag while using a conditional statement to get the second tag as this is he only way I see that works. The code works fine to extract information I just cant figure out how to access that second with the condition that I have found a match in the first <td>.

So the actual html table would look like this.

<html>
<head></head>
<body>
<table id="Table2">
<tr>
  <td class="tSystemRight">System Name: -if this matches</td>
  <td class="tSystemLeft breakword">Windows3756 -I need this</td>
</tr>
<tr>
  <td class="tSystemRight">System Acronym: -if this matches</td>
  <td class="tSystemLeft breakword">WIN37  -I need this</td>
</tr>
</table>
</body>
</html>

The VBA script I have is:

excelRow = 2

For Each tr In msxml.tableRows
cellCount = 1
   For Each TD In tr.getElementsByTagName("TD")
    If ((cellCount = 1) And (TD.innerText = "System Acronym:")) Then
       Worksheets("Data").Cells(excelRow, 2).value = Cells(1, 2)
    ElseIf ((cellCount = 1) And (TD.innerText = "System Name:")) Then
       Worksheets("Data").Cells(excelRow, 3).value = Cells(1, 2)
    cellCount = cellCount + 1
    End If
   Next
Next

This just displays System Name: and System Acronym: in the excel sheet

Upvotes: 3

Views: 11244

Answers (2)

barrowc
barrowc

Reputation: 10679

If you have a td element and you want to get the inner text of the next td in the row then use the nextSibling property, like this:

For Each td In tr.getElementsByTagName("TD")
    If ((cellCount = 1) And (td.innerText = "System Acronym:")) Then
       Worksheets("Data").Cells(excelRow, 2).Value = td.NextSibling.innerText
    ElseIf ((cellCount = 1) And (td.innerText = "System Name:")) Then
       Worksheets("Data").Cells(excelRow, 3).Value = td.NextSibling.innerText
    cellCount = cellCount + 1
    End If
   Next
Next

Note that nothing in the given code is changing the value of excelRow so everything will keep getting written into the same row. Also note that the HTML given has the "System Name" first and the "System Acronym" second whereas the code seems to be structured to look for "System Acronym" first and "System Name" second

Upvotes: 3

Scott Holtzman
Scott Holtzman

Reputation: 27249

I developed the following from a public website with almost identical structure to yours. (https://www.federalreserve.gov/releases/h3/current/)

Requires Reference to Microsoft Internet Controls and Microsoft HTML Object Library

Option Explicit

Sub Test()

Dim ie As New InternetExplorer
Dim doc As New HTMLDocument

With ie

    .Visible = True
    .Navigate "https://www.federalreserve.gov/releases/h3/current/"

    'can place code to wait for IE to load here .. I skipped it since its not in direct focus of question

    Set doc = .Document

    Dim t As HTMLTable
    Dim r As HTMLTableRow
    Dim c As HTMLTableCol

    Set t = doc.getElementById("t1tg1")

    'loop through each row
    For Each r In t.Rows

        If r.Cells(0).innerText = "Mar. 2016" Then Debug.Print r.Cells(1).innerText

        'loop through each column in the row
        'For Each c In r.Cells

        '    Debug.Print c.innerText

        'Next

    Next

End With

End Sub

All that said, after setting your specific table like I have above, I suggest the following edit to your code (I have left out the cellcount check and other stuff):

For Each r In t.Rows

    'find out which columns System Acronym and value will be and modify the Cells(n) statements          
    If r.Cells(0).innerText = "System Acronym:" Then Worksheets("Data").Cells(excelRow, 2).Value = r.Cells(2).innerText

Next

Upvotes: 2

Related Questions