Reputation: 35
I'm attempting to scrape auction data from a website https://www.rbauction.com/heavy-equipment-auctions. My current attempt was to use the below code to pull the website's HTML into VBA and then parse through it and keep only the items I wanted (auction name, number of days, number of items).
Sub RBA_Auction_Scrape()
Dim S_Sheet As Worksheet
Dim Look_String As String
Dim Web_HTML As String
Dim HTTP_OBJ As New MSXML2.XMLHTTP60
On Error GoTo ERR_LABEL:
Set S_Sheet = ActiveWorkbook.ActiveSheet
Web_HTML = ""
HTTP_OBJ.Open "GET", "https://www.rbauction.com/heavy-equipment auctions", False
HTTP_OBJ.Send
On Error Resume Next
Select Case HTTP_OBJ.Status
Case 0: Web_HTML = HTTP_OBJ.responseText
Case 200: Web_HTML = HTTP_OBJ.responseText
Case Else: GoTo ERR_LABEL
End Select
Debug.Print Web_HTML
End Sub
It successfully pulls in the data, but the 'upcoming heavy equipment auction' section that has all of the names and sizes of the auctions does not get pulled into VBA. I'm not very good with HTML in general but I was hoping someone could offer a solution or at least an explanation as to when I search through the website HTML that is pulled into VBA, the articles that I want are not found.
Upvotes: 2
Views: 1502
Reputation: 12612
The webpage source HTML by the link provided https://www.rbauction.com/heavy-equipment-auctions doesn't contain the necessary data, it uses AJAX. The website https://www.rbauction.com has an API available. Response is returned in JSON format. Navigate the page e. g. in Chrome, then open Developer Tools window (F12), Network tab, reload (F5) the page and examine logged XHRs. Most relevant data is JSON string returned by the URL https://www.rbauction.com/rba-api/calendar/v1?e1=true:
You may use the below VBA code to retrieve info as described above. Import JSON.bas module into the VBA project for JSON processing.
Option Explicit
Sub Test_www_rbauction_com()
Const Transposed = False ' Output option
Dim sResponse As String
Dim vJSON
Dim sState As String
Dim i As Long
Dim aRows()
Dim aHeader()
' Retrieve JSON data
XmlHttpRequest "GET", "https://www.rbauction.com/rba-api/calendar/v1?e1=true", "", "", "", sResponse
' Parse JSON response
JSON.Parse sResponse, vJSON, sState
If sState <> "Object" Then
MsgBox "Invalid JSON response"
Exit Sub
End If
' Pick core data
vJSON = vJSON("auctions")
' Extract selected properties for each item
For i = 0 To UBound(vJSON)
Set vJSON(i) = ExtractKeys(vJSON(i), Array("eventId", "name", "date", "itemCount"))
DoEvents
Next
' Convert JSON structure to 2-d arrays for output
JSON.ToArray vJSON, aRows, aHeader
' Output
With ThisWorkbook.Sheets(1)
.Cells.Delete
If Transposed Then
Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)
Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)
Else
OutputArray .Cells(1, 1), aHeader
Output2DArray .Cells(2, 1), aRows
End If
.Columns.AutoFit
End With
MsgBox "Completed"
End Sub
Sub XmlHttpRequest(sMethod As String, sUrl As String, arrSetHeaders, sFormData, sRespHeaders As String, sContent As String)
Dim arrHeader
'With CreateObject("Msxml2.ServerXMLHTTP")
' .SetOption 2, 13056 ' SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
With CreateObject("MSXML2.XMLHTTP")
.Open sMethod, sUrl, False
If IsArray(arrSetHeaders) Then
For Each arrHeader In arrSetHeaders
.SetRequestHeader arrHeader(0), arrHeader(1)
Next
End If
.send sFormData
sRespHeaders = .GetAllResponseHeaders
sContent = .responseText
End With
End Sub
Function ExtractKeys(oSource, aKeys, Optional oDest = Nothing) As Object
Dim vKey
If oDest Is Nothing Then Set oDest = CreateObject("Scripting.Dictionary")
For Each vKey In aKeys
If oSource.Exists(vKey) Then
If IsObject(oSource(vKey)) Then
Set oDest(vKey) = oSource(vKey)
Else
oDest(vKey) = oSource(vKey)
End If
End If
Next
Set ExtractKeys = oDest
End Function
Sub OutputArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
Sub Output2DArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1)
.NumberFormat = "@"
.Value = aCells
End With
End With
End Sub
The output for me is as follows:
BTW, the similar approach applied in other answers.
Upvotes: 3