Reputation: 775
The following code pulls pysicochemical properties from the following link into excel:
https://echa.europa.eu/brief-profile/-/briefprofile/100.002.098
Public Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.002.098", False
XMLReq.send
HTMLDoc.body.innerHTML = XMLReq.responseText
Set SubSectList = HTMLDoc.getElementsByClassName("col-xs-12 col-lg-10 MainContent")(1)
Set SubSects = SubSectList.getElementsByTagName("dt")
For Each SubSect In SubSects
Debug.Print SubSect.innerText & " : " & SubSect.NextSibling.innerText
Next SubSect
End Sub
I've noticed however that I am getting duplicate results. Inspecting the page I can see that this occurs because the results contain both study results
and summaries
where summaries contain a duplicate of the most important reuslt.
Now looking at the code, I realise that im gathering all elements from ("col-xs-12 col-lg-10 MainContent")(1)
and pulling the elements (header) with tag dt
and the next sibling (data).
I wish to modify the code to pull data from just the study data. Looking at the page code, each study data has ClassName EndpointContent
and this leads to a general coding question using all of this as an example.
How can I write code to loop through elements with ClassName EndpointContent
, return the data within this section, and then move onto the Next EndpointContent
and repeat.
My attempt to achieve this is below but I just dont know how to tie in the SubSects into the SubSectList when elements are are collection (i.e. not specified with (1)) and I get the runtime error 438 object doesn't support property or method:
Public Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.002.098", False
XMLReq.send
HTMLDoc.body.innerHTML = XMLReq.responseText
Set SubSectList = HTMLDoc.getElementsByClassName("EndpointContent")
Set SubSects = SubSectList.getElementsByTagName("dt")
For Each SubSect In SubSectList
Debug.Print SubSect.innerText & " : " & SubSect.NextSibling.innerText
Next SubSect
End Sub
Ideally I would like to keep the code as original as possible please.
Upvotes: 1
Views: 378
Reputation: 84465
As you want it like your existing code then gather the list of Endpointcontent
class elements and outer loop those, then inner loop for the dt
elements of each outer node. Then, depending on your Office version, you actually need a chained nextSibling
with the later Office updates (older ones just used nextSibling
):
Option Explicit
Public Sub GetContents()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.002.098", False
XMLReq.send
HTMLDoc.body.innerHTML = XMLReq.responseText
Dim SubSectList As Object, SubSect As Object
Set SubSectList = HTMLDoc.getElementsByClassName("EndpointContent")
Dim dt As Object
r = 1
For Each SubSect In SubSectList
For Each dt In SubSect.getElementsByTagName("dt")
ActiveSheet.Cells(r, 1) = dt.innerText & " : " & dt.NextSibling.NextSibling.innerText
'ActiveSheet.Cells(r, 1) = dt.innerText & " : " & dt.NextSibling.innerText
r = r + 1
Next
Next
End Sub
Upvotes: 2