Yung Lin Ma
Yung Lin Ma

Reputation: 131

How to scrape web data in vba

I have followed jsotola's suggestion and recorded the following macro, but encountered an error, how can I solve it? Run time error 91 and the following code has been highlighted

Selection.ListObject.TableObject.Refresh

Sub Macro1()
     ActiveWorkbook.Queries.Add Name:="1-1-1", Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Source = Web.Page(Web.Contents(""http://www.hkjc.com/English/racing/Horse.asp?HorseNo=V099""))," & Chr(13) & "" & Chr(10) & "    Data0 = Source{0}[Data]," & Chr(13) & "" & Chr(10) & "    #""Changed Type"" = Table.TransformColumnTypes(Data0,{{""Column1"", type text}, {""Column2"", type text}, {""Column3"", type text}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Changed Type"""
    ActiveWorkbook.Queries.Add Name:="1-1-2", Formula:= _
        "let" & Chr(13) & "" & Chr(10) & "    Source = Web.Page(Web.Contents(""http://www.hkjc.com/English/racing/Horse.asp?HorseNo=V099""))," & Chr(13) & "" & Chr(10) & "    Data1 = Source{1}[Data]," & Chr(13) & "" & Chr(10) & "    #""Changed Type"" = Table.TransformColumnTypes(Data1,{{""Column1"", type text}, {""Column2"", type text}, {""Column3"", type text}, {""Column4"", type text}})" & Chr(13) & "" & Chr(10) & "in" & Chr(13) & "" & Chr(10) & "    #""Changed Type"""
    Workbooks("Book1").Connections.Add2 "Query - Table 0", _
        "Connection to the 'Table 0' query in the workbook.", _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=Table 0" _
        , """Table 0""", 6, True, False
    Workbooks("Book1").Connections.Add2 "Query - Table 1", _
        "Connection to the 'Table 1' query in the workbook.", _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=Table 1" _
        , """Table 1""", 6, True, False
    Sheets.Add After:=ActiveSheet
    Selection.ListObject.TableObject.Refresh
    Sheets.Add After:=ActiveSheet
    Selection.ListObject.TableObject.Refresh
End Sub

enter image description here

Upvotes: 0

Views: 478

Answers (1)

QHarr
QHarr

Reputation: 84465

You could use the following script.

① I grab the left hand side links with

.getElementsByTagName("table")(3).getElementsByTagName("a")

As these return relative paths starting with "about:", I replace this part with the fixed prefix string BASESTRING. This gives the absolute path.

② I target the table with main info by getting get a collection of the table tags and selecting the appropriate table by index.

Set hTable = .getElementsByTagName("table")(6)

③ Additionally, as targeting by className is not supported with method I am using, due to late bound HTMLfile I assume), I parse the sub header out "SMART BOY (V076)" from the innerHTML of an element containing this info. Otherwise, it could have been targeted more cleanly with .getElementsByClassName("subsubheader")(0)


Example data on page:

Example data


Example output from code:

Output from code


Code:

Option Explicit
Public Sub GetTable()
    Dim sResponse As String, hTable As Object
    Application.ScreenUpdating = False
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://www.hkjc.com/english/racing/horse.asp?HorseNo=V076", False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With

    sResponse = Mid$(sResponse, InStr(1, sResponse, "HEAD"))

    With CreateObject("htmlFile")
        .Write sResponse
        Set hTable = .getElementsByTagName("table")(6)
        Dim links As Object, title As String
        Set links = .getElementsByTagName("table")(3).getElementsByTagName("a")
        title = Replace$(Split(Split(.getElementsByTagName("table")(2).innerHTML, "title_eng_text>")(1), "<")(0), "&nbsp;", vbNullString)
    End With

    Dim tSection As Object, tRow As Object, tCell As Object, tr As Object, td As Object, r As Long, c As Long, hBody As Object
    Set hBody = hTable.getElementsByTagName("tbody")

    Const BASESTRING As String = "http://www.hkjc.com/english/racing/"

    With ActiveSheet
        .Cells(1, 1) = title
        r = 2
        For Each tSection In hBody               'HTMLTableSection
            Set tRow = tSection.getElementsByTagName("tr") 'HTMLTableRow
            For Each tr In tRow
                Set tCell = tr.getElementsByTagName("td")
                c = 1
                .Cells(r, c) = links(r - 1).innerHTML
                .Cells(r, c + 1) = Replace$(links(r - 1), "about:", BASESTRING)
                For Each td In tCell             'DispHTMLElementCollection
                    .Cells(r, c + 2).Value = td.innerText 'HTMLTableCell
                    c = c + 1
                Next td
                r = r + 1
            Next tr
        Next tSection
        .UsedRange.Columns.AutoFit
    End With
    Application.ScreenUpdating = True
End Sub

Upvotes: 1

Related Questions