Ko Wi
Ko Wi

Reputation: 13

Scraping website using getElementsByClassName --> wrong results

I am trying to scrape the innerText of all classes with className = "disabled" in the following HTML Code snippet: HTML code

The code I am trying to realize this in MS Access (VBA) is as follows:

Set IE = CreateObject("InternetExplorer.Application")

claimLink = "https://www.XXXXX.com/"


IE.navigate claimLink
    Do
       DoEvents
    Loop Until IE.ReadyState = 4

Set menuEnabled = IE.Document.getElementsByClassName("disabled")(0)


For Each Item In menuEnabled
    MsgBox (Item.innerText & " --> " & Item.className)
Next

IE.Quit

Set menuEnabled = Nothing
Set searchres = Nothing
Set IE = Nothing

...as a result though, I get all items in this list, and MS Access also says that the class Name of all items (Bibliographic data, Description, Claims, etc.) is "disabled".

Can anyone please tell me what's wrong at my Code? All I want it to return is "Description", "Claims" and "Cited Documents".

These Grey items are the only items I want to be replied

Thanks! Kornelius

Upvotes: 1

Views: 149

Answers (1)

QHarr
QHarr

Reputation: 84465

It appears to need a short wait for elements to be updated. I am using a css selector combination to target the elements of interest.

.epoContentNav [class=disabled]

The "." is a class selector. It selects for elements with class name matching after the "." i.e. epoContentNav. The " " is a descendant combinator meaning what is on the right are children of what is on the left. The [] is an attribute selector which selects an element by the attribute named within. In this case I am using an attribute=value combination to also specify the class name must be disabled. The whole thing reads as find elements with class disabled that have a parent with class epoContentNav. It selects all the nav bar elements with class disabled.

Info on those selectors here.

Option Explicit    
Public Sub GetInfo()
    Dim IE As New InternetExplorer, i As Long, nodeList

    With IE
        .Visible = True
        .navigate "https://worldwide.espacenet.com/publicationDetails/claims?DB=&ND=&locale=en_EP&FT=D&CC=DE&NR=1952914A&KC=A&tree=false#"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Application.Wait Now + TimeSerial(0, 0, 2)

        Set nodeList = .document.querySelectorAll(".epoContentNav [class=disabled]")
        For i = 0 To nodeList.Length - 1
            Debug.Print nodeList.item(i).innerText, nodeList.item(i).getAttribute("class")
        Next
        Stop
        'Quit '<== Remember to quit application
    End With
End Sub

Upvotes: 1

Related Questions