Zach Young
Zach Young

Reputation: 23

How to use "getElementByClassName()" to parsing the HTML elements in VBA?

I am going to get the elements' value from HTML. The HTML code is below.

It can be shown on the screen.

Code screenshot

Dim xmlhttp As Object
Dim url As String
Dim toTranslate As String
Dim htmldoc As HTMLDocument


toTranslate = TextBox1.Value
url = "http://dict.youdao.com/search?q=" & toTranslate & "&keyfrom=dict.index"
Set xmlhttp = CreateObject("MSXML2.XMLHTTP") '创建XML对象

xmlhttp.Open "GET", url, False '用GET 发送请求

xmlhttp.send
'等待响应
Do While xmlhttp.readyState <> 4
    DoEvents
Loop
Dim explore As New InternetExplorer


'Set htmldoc =explore.document

MsgBox xmlhttp.responseText

But I want to get each value of tag "li" in elements which class is "trans-container"(The each Chinese words).

The content I want to get

I only know the method "getElementsByClassName()", but I don't know how to use it. Thanks for your help!

Upvotes: 1

Views: 1484

Answers (2)

T.M.
T.M.

Reputation: 9938

Array alternative using FilterXML()

Based on a received response string I demonstrate a way to get the list items via FilterXML() available since vers. 2013+.

Function getListItems(ByVal sResponse As String, Optional IsZeroBased As Boolean = False)
'Purpose: assign list items in div trans-container class to 1-dim array
    'XPath search expression
    Dim xp As String
    xp = "//div[@class='trans-container']/ul/li"
    With WorksheetFunction
        'assign list items to 1-based 2-dim array
        Dim listItems: listItems = .FilterXML(sResponse, xp)
        
        'optional Test display in VB Editor's immediate window
        Debug.Print Join(.Transpose(listItems), ", ")
    
        'return <li> items as 1-dim array (optionally 0-based)
        getListItems = .Transpose(listItems)
        If IsZeroBased Then ReDim Preserve getListItems(0 To UBound(getListItems) - 1)
    End With

End Function

Upvotes: 2

GWD
GWD

Reputation: 3978

You need to create a HTMLDocument object from the response and use it for the parsing. As annotated in the code, it is necessary to use early binding to use the method getElementsByClassName. Try something like the following:

Dim url As String
Dim toTranslate As String

toTranslate = TextBox1.Value
' Note: use https:// rather than http://
url = "https://dict.youdao.com/search?q=" & toTranslate & "&keyfrom=dict.index"

' Creating and sending the request:
Dim xmlhttp As Object
Set xmlhttp = CreateObject("MSXML2.XMLHTTP") '创建XML对象

xmlhttp.Open "GET", url, False '用GET 发送请求
xmlhttp.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
xmlhttp.send ""

' Getting the response
' This needs to be early bound so the method getElementsByClassName is available!
' The required reference is "Microsoft HTML Object Library"
Dim objHTML as HTMLDocument
Set objHTML = New HTMLDocument

objHTML.body.innerHTML = xmlhttp.responseText

' Parsing the response
Dim objTransContainers as Object, objTransContainer as Object
Dim objLis as Object, objLi as Object
Dim retText as String

Set objTransContainers = objHTML.getElementsByClassName("trans-container")

For Each objTransContainer in objTransContainers 
    Set objLis = objTransContainer.getElementsByTagName("li")
    For Each objLi in objLis
        retText = retText & objLi.innerText & vbNewLine
    Next objLi
Next objTransContainer 

MsgBox retText

Alternatively, you can use only the tags and check for the class name in a loop for the parsing. The advantage is, that this method will also work with a late bound HTMLDocument:

' Getting the response:
Dim objHTML as Object
Set objHTML = CreateObject("htmlFile")

' Note: this objHTML.write will not work with early binding! 
' In that case you have to use the .body.innerHTML 
' like in the code sample above.
With objHTML
    .Open
    .write xmlhttp.responseText
    .Close
End With

' Parsing the response
Dim objDivs as Object, objDiv as Object
Dim objLis as Object, objLi as Object
Dim retText as String

Set objDivs = objHTML.getElementsByTagName("div")

For Each objDiv in objDivs
    If objDiv.className = "trans-container" Then
        Set objLis = objDiv.getElementsByTagName("li")
        For Each objLi in objLis
            retText = retText & objLi.innerText & vbNewLine
        Next objLi
    End If
Next objDiv

MsgBox retText

Upvotes: 2

Related Questions