Xbis1986
Xbis1986

Reputation: 1

How can I scrape a table of a specific website via Excel VBA?

I would like to help me if you can I want to get the data from the table in the website:https://www.balkangashub.bg/en, I have written the above vba code but the only thing it returns is the header of the table. Any help would be appreciated. Thank you again.


Sub Web_Table_Option_Two()
    Dim HTMLDoc As New HTMLDocument
    Dim objTable As Object
    Dim lRow As Long
    Dim lngTable As Long
    Dim lngRow As Long
    Dim lngCol As Long
    Dim ActRw As Long
    Dim objIE As InternetExplorer
    Set objIE = New InternetExplorer
    objIE.navigate "https://www.balkangashub.bg/en"

    Do Until objIE.readyState = 4 And Not objIE.Busy
        DoEvents
    Loop
    Application.Wait (Now + TimeValue("0:00:03"))
    
    
    HTMLDoc.body.innerHTML = objIE.document.body.innerHTML
    
    With HTMLDoc.body
    
        Set objTable = .getElementsByTagName("table")
        For lngTable = 0 To objTable.Length - 1
            For lngRow = 0 To objTable(lngTable).Rows.Length - 1
                For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
                    ThisWorkbook.Sheets(1).Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
                Next lngCol
            Next lngRow
            ActRw = ActRw + objTable(lngTable).Rows.Length + 1
        Next lngTable
    End With
    
    objIE.Quit
    
    MsgBox "Completed"
    
End Sub

Upvotes: 0

Views: 173

Answers (1)

QHarr
QHarr

Reputation: 84465

If you make the IE instance visible you will see that the page hasn't loaded properly and as such the table content is missing, hence the result you are observing. As this happens when automating with IE, and not, for example, when manually loading the page via IE, I assume there is potentially something about the automation being picked up and causing the load to stop. There is nothing in the robots.txt other than a specification that all UAs are allowed. There is nothing about scraping/crawling in the T&Cs either.

You can, however, obtain the content by reproducing the process the page deploys to dynamically retrieve this content i.e. via XHR call. The page does this with ajax. As you need to pick up various cookies to pass a token challenge, you can use ServerXMLHTTP60. This requires adding a reference to Microsoft XML, 6.0.

A little investigation and it seems, for VBA POST, there are a number of additional headers needed as shown below.

The response is JSON so you may wish to parse the results with a JSON parser such as jsonconverter.bas. There are tons of examples of how to do this on SO.


Option Explicit

Public Sub GetData()

    Dim xhr As MSXML2.ServerXMLHTTP60

    Set xhr = New MSXML2.ServerXMLHTTP60

    With xhr
        .Open "GET", "https://www.balkangashub.bg/en", False
        .send
        
        Dim html As MSHTML.HTMLDocument, csrfToken As String
        
        Set html = New MSHTML.HTMLDocument
        
        html.body.innerHTML = "<body>" & .responseText & "</body>"
        
        csrfToken = html.querySelector("meta[name=csrf-token]").Content
        
        .Open "POST", "https://www.balkangashub.bg/ajax/spotMarketData", False
        .setRequestHeader "x-csrf-token", csrfToken
        .setRequestHeader "user-agent", "mozilla/5.0"
        .setRequestHeader "x-requested-with", "XMLHttpRequest"
        .setRequestHeader "content-type", "application/x-www-form-urlencoded; charset=UTF-8"
        .send "date_from=2022-05-31&date_to=2022-06-06&type=DA&year=2022"
        
        Debug.Print .responseText
   
    End With
End Sub

Upvotes: 2

Related Questions