Nick
Nick

Reputation: 775

Get elements for each instance of a ClassName

The following code pulls pysicochemical properties from the following link into excel:

https://echa.europa.eu/brief-profile/-/briefprofile/100.002.098

   Public Sub GetContents()
    
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    
    XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.002.098", False
    XMLReq.send
     
    HTMLDoc.body.innerHTML = XMLReq.responseText

    Set SubSectList = HTMLDoc.getElementsByClassName("col-xs-12 col-lg-10 MainContent")(1)
    Set SubSects = SubSectList.getElementsByTagName("dt")
   
    For Each SubSect In SubSects

    Debug.Print SubSect.innerText & " : " & SubSect.NextSibling.innerText

    Next SubSect

End Sub

I've noticed however that I am getting duplicate results. Inspecting the page I can see that this occurs because the results contain both study results and summaries where summaries contain a duplicate of the most important reuslt.

Now looking at the code, I realise that im gathering all elements from ("col-xs-12 col-lg-10 MainContent")(1) and pulling the elements (header) with tag dt and the next sibling (data).

I wish to modify the code to pull data from just the study data. Looking at the page code, each study data has ClassName EndpointContent and this leads to a general coding question using all of this as an example.

How can I write code to loop through elements with ClassName EndpointContent, return the data within this section, and then move onto the Next EndpointContent and repeat.

My attempt to achieve this is below but I just dont know how to tie in the SubSects into the SubSectList when elements are are collection (i.e. not specified with (1)) and I get the runtime error 438 object doesn't support property or method:

Public Sub GetContents()
    
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    
    XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.002.098", False
    XMLReq.send
     
    HTMLDoc.body.innerHTML = XMLReq.responseText

    Set SubSectList = HTMLDoc.getElementsByClassName("EndpointContent")
    
    Set SubSects = SubSectList.getElementsByTagName("dt")
    
    
    For Each SubSect In SubSectList

    Debug.Print SubSect.innerText & " : " & SubSect.NextSibling.innerText

    Next SubSect

End Sub

Ideally I would like to keep the code as original as possible please.

Upvotes: 1

Views: 378

Answers (1)

QHarr
QHarr

Reputation: 84465

As you want it like your existing code then gather the list of Endpointcontent class elements and outer loop those, then inner loop for the dt elements of each outer node. Then, depending on your Office version, you actually need a chained nextSibling with the later Office updates (older ones just used nextSibling):

Option Explicit

Public Sub GetContents()
    
    Dim XMLReq As New MSXML2.XMLHTTP60
    Dim HTMLDoc As New MSHTML.HTMLDocument
    
    XMLReq.Open "Get", "https://echa.europa.eu/brief-profile/-/briefprofile/100.002.098", False
    XMLReq.send
     
    HTMLDoc.body.innerHTML = XMLReq.responseText

    Dim SubSectList As Object, SubSect As Object
    
    Set SubSectList = HTMLDoc.getElementsByClassName("EndpointContent")
    
    Dim dt As Object
    
    r = 1
    
    For Each SubSect In SubSectList
    
        For Each dt In SubSect.getElementsByTagName("dt")
            
            ActiveSheet.Cells(r, 1) = dt.innerText & " : " & dt.NextSibling.NextSibling.innerText
            'ActiveSheet.Cells(r, 1) = dt.innerText & " : " & dt.NextSibling.innerText
            r = r + 1
            
        Next
    
    Next

End Sub

Upvotes: 2

Related Questions