VBA webscraper - Return InnerHTML with regex

Question

Using Excel VBA, i have to scrape some data from this website.

Since the relevant website objects dont contain an id, I cannot use HTML.Document.GetElementById.

However, I noticed that the relevant information is always stored in a

-section like the following:

Basler Versicherung AG Özmen

Question: Is it possible to construct a RegExp that, probably in a Loop, returns the contents inside

and the next

?

What I have so far is the complete InnerHtml of the container, obviously I need to add some code to loop over the yet-to-be-constructed RegExp.

Private Function GetInnerHTML(url As String) As String
    Dim i As Long
    Dim Doc As Object
    Dim objElement As Object
    Dim objCollection As Object

On Error GoTo catch
   'Internet Explorer Object is already assigned
   With ie
        .Navigate url
        While .Busy
            DoEvents
        Wend
        GetInnerHTML = .document.getelementbyId("cphContent_sectionCoreProperties").innerHTML
    End With
    Exit Function
catch:
    GetInnerHTML = Err.Number & " " & Err.Description
End Function

SIM · Accepted Answer

Another way you can achieve the same using XMLHTTP request method. Give it a go:

Sub Fetch_Data()
    Dim S$, I&

    With New XMLHTTP60
        .Open "GET", "https://www.uid.admin.ch/Detail.aspx?uid_id=CHE-105.805.649", False
        .send
        S = .responseText
    End With

    With New HTMLDocument
        .body.innerHTML = S
        With .querySelectorAll("#cphContent_sectionCoreProperties label[id^='cphContent_ct']")
            For I = 0 To .Length - 1
                Cells(I + 1, 1) = .Item(I).innerText
                Cells(I + 1, 2) = .Item(I).NextSibling.FirstChild.innerText
            Next I
        End With
    End With
End Sub

Reference to add to the library before executing the above script:

Microsoft HTML Object Library
Microsoft XML, V6.0

VBA webscraper - Return InnerHTML with regex

Answers (2)

Related Questions