TradorDave
TradorDave

Reputation: 49

How to get the data from the children of a specific parent HTML element in VBA

Once again I find myself encountering a very specific problem. I'm pretty new to VBA, and especially HTML, so bear with me. I've built a functioning web scraper in VBA, but there are a few specific tasks I want to accomplish that I can't figure out.

Here is the HTML sample that my question refers to.

I've replaced all the stuff that doesn't matter with ellipses. The important part that I want to scrape is in the "a" tags, the "data-shorturl" (or just the innerText). This is a website name. There are up to five of these, but there are not always five. This is also just one of two section with up to five websites listed. The posted section has the children of <div class="referralsSites referring">, and the other has the children of <div class="referralsSites destination">.

Each website in the "referring" section I want to assign to "Up" variables -- the first website assigned to "Up1", the second to "Up2", and so on, but only depending on how many websites are in the "referring" section. I want to do the same in the "destination" section, but assigned to "Down" variables (Down1, Down2, etc.) depending on how many destination sites there are.

If I were to just use getElementsByClassName("websitePage-listItemLink js-tooltipTarget"), for example, I wouldn't be able to differentiate between the referral and destination sites.

Here is my code so far:

Sub GetSimilarWebData()
    Dim appIE As InternetExplorer
    Dim HTML As HTMLDocument
    Dim ieWindow As SHDocVw.InternetExplorer
    Dim URL As String
    Dim Rankings As IHTMLElementCollection, Traffic As IHTMLElementCollection, ReferSites As IHTMLElementCollection, DestSites As IHTMLElementCollection, _
        rSite As IHTMLElement, rSiteNo As Long, dSite As IHTMLElement, dSiteNo As Long, GlobalRank As String, CountryName As String, CountryRank As String, _
        Visits As String, Direct As String, Refer As String, Search As String, Social As String, Display As String, _
        Up1 As String, Up2 As String, Up3 As String, Up4 As String, Up5 As String, _
        D1 As String, D2 As String, D3 As String, D4 As String, D5 As String
    Dim FraudLast As Long
    CheckLast = Worksheets("Sheet1").Range("I1").End(xlDown).Offset(1).Row

    webStr = Worksheets("Sheet1").Range("A" & CheckLast).Value

    Set appIE = New InternetExplorer
        appIE.Visible = False
        appIE.navigate "https://www.similarweb.com/website/" & webStr

    Do While appIE.readyState <> READYSTATE_COMPLETE
        Application.StatusBar = "Connecting to SimilarWeb..."
        DoEvents
    Loop

    Set HTML = appIE.document

    Set appIE = Nothing
        Application.StatusBar = ""

    Set Rankings = HTML.getElementsByClassName("rankingItem-value")
        GlobalRank = Rankings(0).innerText
            If GlobalRank = "N/A" Then
                GlobalRank = "null"
                CountryName = "null"
                CountryRank = "null"
            Else
                CountryName = HTML.getElementsByClassName("rankingItem-subTitle")(1).innerText
                CountryRank = Rankings(1).innerText
            End If

    Visits = HTML.getElementsByClassName("engagementInfo-value engagementInfo-value--large u-text-ellipsis")(0).innerText
        If InStr(Visits, "M") <> 0 Then
            Visits = Replace(Visits, ".", "")
            Visits = Replace(Visits, "M", "00000")
        ElseIf InStr(Visits, "K") <> 0 Then
            Visits = Replace(Visits, ".", "")
            Visits = Replace(Visits, "K", "00")
        ElseIf InStr(Visits, "B") <> 0 Then
            Visits = Replace(Visits, ".", "")
            Visits = Replace(Visits, "B", "00000000")
        End If

    Set Traffic = HTML.getElementsByClassName("trafficSourcesChart-value")
        Direct = Traffic(0).innerText
        Refer = Traffic(1).innerText
        Search = Traffic(2).innerText
        Social = Traffic(3).innerText
        Display = Traffic(4).innerText

'Here's what I've started off with:    
    Set ReferSite = HTML.getElementsByClassName("referralsSites referring")
        rSiteNo = ReferSite.Length
    Set DestSite = HTML.getElementsByClassName("referralsSites destination")
        dSiteNo = DestSite.Length
        'For Each rSite In ReferSite
End Sub

I'm not really sure how to approach the problem. Everything else in my code works fine, but of course if there's anything I can do to improve the speed that would also be welcome.

All of this is referring to data on similarweb.com.

Upvotes: 1

Views: 11555

Answers (1)

Robin Mackenzie
Robin Mackenzie

Reputation: 19289

The getElementsByClassName method can be used on a IHTMLElement object as well as HTMLDocument object. This means you can get the separate lists of referral and destination sites in two 'hops'.

First get the <div>s with the classname of referralsSites referring or referralsSites destination. The getElementsByClassName method returns a IHTMLElementCollection which is a collection of IHTMLElement. So you get the 0th element of the collection (assuming there's only one <div>) and then get the <a>s within that <div> with a class of websitePage-listItemLink by calling getElementsByClassName method again on the IHTMLElement for the <div>.

Here is an example for stackoverflow.com - I'm just doing Debug.Print of the output but you might want to assign the site-names into an array, or Collection or something.

Option Explicit

Sub Test()

    'references required:
    'Microsoft HTML Object Library
    'Microsoft Internet Controls

    Dim strUrl As String
    Dim objIe As InternetExplorer
    Dim objHtml As HTMLDocument
    Dim strHtml As String
    Dim objDivs As IHTMLElementCollection
    Dim objAnchors As IHTMLElementCollection
    Dim intCounter As Integer

    'set target to scrape
    strUrl = "https://www.similarweb.com/website/stackoverflow.com"

    'get html from page
    Set objIe = New InternetExplorer
    objIe.Visible = False
    objIe.navigate strUrl
    While objIe.readyState <> READYSTATE_COMPLETE
        DoEvents
    Wend

    'assign html to DOM document
    Set objHtml = New HTMLDocument
    Set objHtml = objIe.document

    'get referrals
    Set objDivs = objHtml.getElementsByClassName("referralsSites referring")
    If objDivs.Length > 0 Then
        Set objAnchors = objDivs(0).getElementsByClassName("websitePage-listItemLink")
        Debug.Print "Referrers:"
        If objAnchors.Length > 0 Then
            For intCounter = 0 To objAnchors.Length - 1
                Debug.Print objAnchors(intCounter).innerText
            Next intCounter
        End If
    End If

    'get destinations
    Set objDivs = objHtml.getElementsByClassName("referralsSites destination")
    If objDivs.Length > 0 Then
        Set objAnchors = objDivs(0).getElementsByClassName("websitePage-listItemLink")
        Debug.Print "Destinations:"
        If objAnchors.Length > 0 Then
            For intCounter = 0 To objAnchors.Length - 1
                Debug.Print objAnchors(intCounter).innerText
            Next intCounter
        End If
    End If

    'clean up
    Set objHtml = Nothing
    objIe.Quit
    Set objIe = Nothing

End Sub

This gives an output of:

Referrers:
news.ycombinator.com
qwant.com
github.com
remoteok.io
serverfault.com
Destinations:
jsfiddle.net
youtube.com
github.com
i.sstatic.net
w3schools.com

Upvotes: 2

Related Questions