Quint
Quint

Reputation: 590

VBA: Scraping information from HTML Table

I'm trying to pull information from an html table. I want to add each element with in the table to a collection. This is what I have so far.

Dim htmlTable As Object
Dim coll2 As Collection
Set coll2 = New Collection
Set IE = New InternetExplorerMedium

With IE
'.AddressBar = False
'.MenuBar = False
.Navigate ("PASSWORDED SITE")
.Visible = True
End With

Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set tableItem = IE.Document.getElementsByTagName("td")
With coll2
For Each tableItem In htmlTable.innerHTML
   .Add tableItem
Next
End With

I have a problem with this line For Each tableItem In htmlTable.innerText I tried diffent variations of htmlTable.innerText each throwing differant errors.

This is the HTML Extract for the table.

<table class="Grid" id="ctl00_ContentPlaceHolder1_gvExtract" style="border-collapse: collapse;" border="1" rules="all" cellspacing="0">
        <tbody><tr class="GridHeader" style="font-weight: bold;">
            <th scope="col">Delete</th><th scope="col">Download</th><th scope="col">Extract Date</th><th scope="col">User Id Owner</th>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2942854)'>Delete</a></td>
            <td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2942854")'>Work Order Inquiry - Work Order</a></td>
            <td>06/20/2017 07:50:37</td>
            <td>MBMAYO</td>
        </tr><tr class="GridItemEven" style="background-color: rgb(204, 204, 204);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2942836)'>Delete</a></td>
            <td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2942836")'>Work Order Inquiry - Work Order</a></td>
            <td>06/20/2017 07:39:29</td>
            <td>MBMAYO</td>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941835)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941835")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:23:54</td><td>MBMAYO</td>
        </tr><tr class="GridItemEven" style="background-color: rgb(204, 204, 204);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941827)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941827")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:16:16</td><td>MBMAYO</td>
        </tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
            <td><a href='javascript:DoPostBack("DeleteExtract", 2941822)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941822")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:14:06</td><td>MBMAYO</td>
        </tr>
    </tbody></table>

The goal is to store each <td> as an item for a collection and then retrieve the date for example <td>06/20/2017 07:50:37</td> from it. This table Grows so I think an array is out of the question?

Edit from comment:

I have been trying call this function, I'm getting a object does not support this method error:

Public Function htmlCell(id As String) As String 
    htmlCell = IE.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
                    .get‌​ElementsByTagName("t‌​d")(id).innerHTML 
End Function

Upvotes: 1

Views: 12078

Answers (3)

ASH
ASH

Reputation: 20302

I think it should be something like this.

Sub Scrape_HTML()

Set ie = CreateObject("InternetExplorer.application")

With ie
    .Visible = True
    .navigate "your_URL_here"

' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

                        Set Links = ie.document.getElementsByTagName("tr")
                        RowCount = 1

                            ' Scrape out the innertext of each 'tr' element.
                            With Sheets("DataSheet")
                                For Each lnk In Links
                                    .Range("A" & RowCount) = lnk.innerText
                                    RowCount = RowCount + 1
                                Next
                            End With

End Sub

Upvotes: 0

gembird
gembird

Reputation: 14053

What you probably need is something like this. HTH

Dim htmlTable As MSHTML.htmlTable
Dim htmlTableCells As MSHTML.IHTMLElementCollection
Dim htmlTableCell As MSHTML.htmlTableCell
Dim htmlAnchor As MSHTML.HTMLAnchorElement

Set htmlTable = ie.document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set htmlTableCells = htmlTable.getElementsByTagName("td")
With coll2
    For Each htmlTableCell In htmlTableCells
        If VBA.TypeName(htmlTableCell.FirstChild) = "HTMLAnchorElement" Then
            Set htmlAnchor = htmlTableCell.FirstChild
            .Add htmlAnchor.innerHTML
        Else
            .Add htmlTableCell.innerHTML
        End If
    Next
End With

Result

Dim el
For Each el In coll2
    Debug.Print el
Next el

Output:

Delete
Work Order Inquiry - Work Order
06/20/2017 07:50:37
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:39:29
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:23:54
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:16:16
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:14:06
MBMAYO

Upvotes: 3

Andre
Andre

Reputation: 27634

I would try something like this:

Dim htmlTable As Object
Dim collTD As Collection
Dim oNode as Object

' Set IE ...

Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
' You only want the td's inside htmlTable !
Set collTD = htmlTable.getElementsByTagName("td")

For Each oNode In collTD 
    Debug.Print oNode.InnerHTML
    ' Stop -> use Watch window to drill down into oNode subitems
Next oNode 

and go from there.

Upvotes: 1

Related Questions