Reputation: 1564
Using Excel VBA, i have to scrape some data from this website.
Since the relevant website objects dont contain an id
, I cannot use HTML.Document.GetElementById
.
However, I noticed that the relevant information is always stored in a <div>
-section like the following:
<div style="padding:7px 12px">Basler Versicherung AG Özmen</div>
Question:
Is it possible to construct a RegExp
that, probably in a Loop, returns the contents inside <div style="padding:7px 12px">
and the next </div>
?
What I have so far is the complete InnerHtml
of the container, obviously I need to add some code to loop over the yet-to-be-constructed RegExp.
Private Function GetInnerHTML(url As String) As String
Dim i As Long
Dim Doc As Object
Dim objElement As Object
Dim objCollection As Object
On Error GoTo catch
'Internet Explorer Object is already assigned
With ie
.Navigate url
While .Busy
DoEvents
Wend
GetInnerHTML = .document.getelementbyId("cphContent_sectionCoreProperties").innerHTML
End With
Exit Function
catch:
GetInnerHTML = Err.Number & " " & Err.Description
End Function
Upvotes: 2
Views: 687
Reputation: 22440
Another way you can achieve the same using XMLHTTP
request method. Give it a go:
Sub Fetch_Data()
Dim S$, I&
With New XMLHTTP60
.Open "GET", "https://www.uid.admin.ch/Detail.aspx?uid_id=CHE-105.805.649", False
.send
S = .responseText
End With
With New HTMLDocument
.body.innerHTML = S
With .querySelectorAll("#cphContent_sectionCoreProperties label[id^='cphContent_ct']")
For I = 0 To .Length - 1
Cells(I + 1, 1) = .Item(I).innerText
Cells(I + 1, 2) = .Item(I).NextSibling.FirstChild.innerText
Next I
End With
End With
End Sub
Reference to add to the library before executing the above script:
Microsoft HTML Object Library
Microsoft XML, V6.0
Upvotes: 2
Reputation: 5677
I don't think you need Regular expressions to find the content on the page. You can use the relative positions of the elements to find the content I believe you are after.
Code
Option Explicit
Public Sub GetContent()
Dim URL As String: URL = "https://www.uid.admin.ch/Detail.aspx?uid_id=CHE-105.805.649"
Dim IE As Object: Set IE = CreateObject("InternetExplorer.Application")
Dim Labels As Object
Dim Label As Variant
Dim Values As Variant: ReDim Values(0 To 1, 0 To 5000)
Dim i As Long
With IE
.Navigate URL
.Visible = False
'Load the page
Do Until IE.busy = False And IE.readystate = 4
DoEvents
Loop
'Find all labels in the table
Set Labels = IE.document.getElementByID("cphContent_pnlDetails").getElementsByTagName("label")
'Iterate the labels, then find the divs relative to these
For Each Label In Labels
Values(0, i) = Label.InnerText
Values(1, i) = Label.NextSibling.Children(0).InnerText
i = i + 1
Next
End With
'Dump the values to Excel
ReDim Preserve Values(0 To 1, 0 To i - 1)
ThisWorkbook.Sheets(1).Range("A1:B" & i) = WorksheetFunction.Transpose(Values)
'Close IE
IE.Quit
End Sub
Upvotes: 2