MB93
MB93

Reputation: 35

VBA - HTML scraping problems

I'm attempting to scrape auction data from a website https://www.rbauction.com/heavy-equipment-auctions. My current attempt was to use the below code to pull the website's HTML into VBA and then parse through it and keep only the items I wanted (auction name, number of days, number of items).

Sub RBA_Auction_Scrape()

    Dim S_Sheet As Worksheet
    Dim Look_String As String
    Dim Web_HTML As String
    Dim HTTP_OBJ As New MSXML2.XMLHTTP60

    On Error GoTo ERR_LABEL:
    Set S_Sheet = ActiveWorkbook.ActiveSheet
    Web_HTML = ""
    HTTP_OBJ.Open "GET", "https://www.rbauction.com/heavy-equipment auctions", False
    HTTP_OBJ.Send
    On Error Resume Next
    Select Case HTTP_OBJ.Status
        Case 0: Web_HTML = HTTP_OBJ.responseText
        Case 200: Web_HTML = HTTP_OBJ.responseText
        Case Else: GoTo ERR_LABEL
    End Select

    Debug.Print Web_HTML

End Sub

It successfully pulls in the data, but the 'upcoming heavy equipment auction' section that has all of the names and sizes of the auctions does not get pulled into VBA. I'm not very good with HTML in general but I was hoping someone could offer a solution or at least an explanation as to when I search through the website HTML that is pulled into VBA, the articles that I want are not found.

Upvotes: 2

Views: 1502

Answers (1)

omegastripes
omegastripes

Reputation: 12612

The webpage source HTML by the link provided https://www.rbauction.com/heavy-equipment-auctions doesn't contain the necessary data, it uses AJAX. The website https://www.rbauction.com has an API available. Response is returned in JSON format. Navigate the page e. g. in Chrome, then open Developer Tools window (F12), Network tab, reload (F5) the page and examine logged XHRs. Most relevant data is JSON string returned by the URL https://www.rbauction.com/rba-api/calendar/v1?e1=true:

XHR-previev

XHR-headers

You may use the below VBA code to retrieve info as described above. Import JSON.bas module into the VBA project for JSON processing.

Option Explicit

Sub Test_www_rbauction_com()

    Const Transposed = False ' Output option

    Dim sResponse As String
    Dim vJSON
    Dim sState As String
    Dim i As Long
    Dim aRows()
    Dim aHeader()

    ' Retrieve JSON data
    XmlHttpRequest "GET", "https://www.rbauction.com/rba-api/calendar/v1?e1=true", "", "", "", sResponse
    ' Parse JSON response
    JSON.Parse sResponse, vJSON, sState
    If sState <> "Object" Then
        MsgBox "Invalid JSON response"
        Exit Sub
    End If
    ' Pick core data
    vJSON = vJSON("auctions")
    ' Extract selected properties for each item
    For i = 0 To UBound(vJSON)
        Set vJSON(i) = ExtractKeys(vJSON(i), Array("eventId", "name", "date", "itemCount"))
        DoEvents
    Next
    ' Convert JSON structure to 2-d arrays for output
    JSON.ToArray vJSON, aRows, aHeader
    ' Output
    With ThisWorkbook.Sheets(1)
        .Cells.Delete
        If Transposed Then
            Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
            Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
        Else
            OutputArray .Cells(1, 1), aHeader
            Output2DArray .Cells(2, 1), aRows
        End If
        .Columns.AutoFit
    End With
    MsgBox "Completed"

End Sub

Sub XmlHttpRequest(sMethod As String, sUrl As String, arrSetHeaders, sFormData, sRespHeaders As String, sContent As String)

    Dim arrHeader

    'With CreateObject("Msxml2.ServerXMLHTTP")
    '    .SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
    With CreateObject("MSXML2.XMLHTTP")
        .Open sMethod, sUrl, False
        If IsArray(arrSetHeaders) Then
            For Each arrHeader In arrSetHeaders
                .SetRequestHeader arrHeader(0), arrHeader(1)
            Next
        End If
        .send sFormData
        sRespHeaders = .GetAllResponseHeaders
        sContent = .responseText
    End With

End Sub

Function ExtractKeys(oSource, aKeys, Optional oDest = Nothing) As Object

    Dim vKey

    If oDest Is Nothing Then Set oDest = CreateObject("Scripting.Dictionary")
    For Each vKey In aKeys
        If oSource.Exists(vKey) Then
            If IsObject(oSource(vKey)) Then
                Set oDest(vKey) = oSource(vKey)
            Else
                oDest(vKey) = oSource(vKey)
            End If
        End If
    Next
    Set ExtractKeys = oDest

End Function

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

The output for me is as follows:

output

BTW, the similar approach applied in other answers.

Upvotes: 3

Related Questions