Reputation: 5
I have some questions reagrding an Excel VBA program that I want to build.
Basically it's pretty easy. I want to access the following website https://coronavirus.jhu.edu/map.html and extract the Confirmed Cases by Country/Region/Sovereignty (it's the table on the very left of the dashborad) and paste the values in excel.
I know all the basic stuff on how to setup an internetexplorer instance and scraping the page by tags, classes, ids etc. But I think in this sceanrio I cannot use the basic things. I guess it's pretty tricky actually. The information I am looking for is within some tags. But I cannot get their textcontent when I use the getelementsbytagname("strong") approach.
Could someone help me in this case?
I am grateful for any hints, advices and solutions.
Below you'll find the start of my code.
Best Simon
Sub test()
Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim i As Integer
Dim obj_coll As IHTMLElementCollection
Dim obj As HTMLObjectElement
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "https://coronavirus.jhu.edu/map.html"
Do Until ie.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Debug.Print "Successfully connected with host"
Set html = ie.document
Set obj_coll = html.getElementsByTagName("strong")
For Each obj In obj_coll
Debug.Print obj.innerText
Next obj
ie.Quit
End Sub
Upvotes: 0
Views: 250
Reputation: 84465
You can use the iframe url direct to navigate to. You then need a timed wait to ensure the data has loaded within that iframe. I would then collect nodeLists via faster css selectors. As the nodeLists (one for figures and the other for locations) are the same length you will only need a single loop to index into both lists to get rows of data.
Option Explicit
Public Sub GetCovidFigures()
Dim ie As SHDocVw.InternetExplorer
Set ie = New SHDocVw.InternetExplorer
Dim t As Date
Const MAX_WAIT_SEC As Long = 30
With ie
.Visible = True
.Navigate2 "https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6"
Do
DoEvents
Loop While .Busy Or .readyState <> READYSTATE_COMPLETE
t = Timer
Do
If Timer - t > MAX_WAIT_SEC Then Exit Sub
Loop While .document.querySelectorAll(".feature-list strong").Length = 0
Dim figures As Object, location As Object, results(), i As Long
Set figures = .document.querySelectorAll("h5 strong")
Set location = .document.querySelectorAll("h5 span:last-child")
ReDim results(1 To figures.Length, 1 To 2)
For i = 0 To figures.Length - 1
results(i + 1, 1) = figures.item(i).innerText
results(i + 1, 2) = location.item(i).innerText
Next
.Quit
End With
ActiveSheet.Cells(1, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
Consider how frequently you want this. There are large numbers of APIs popping up to supply this data which you could instead issue faster xhr requests to. Additionally, you could simply take the source data in csv form from github here. *Files after Feb 1 (UTC): once a day around 23:59 (UTC). There is a rest API visible in dev tools network tab that is frequently supplying new data in json format which is used to update the page. That can be accessed via Python + requests or R + httr modules for example. I suspect this endpoint is not intended to be hit so look for public APIs.
Upvotes: 2