zimide82
zimide82

Reputation: 57

Get all elements in a div with VBA

I am trying to scrape Project Gutenberg.

I am able to use the .getElementsByClassName("chapter") to get the divs that hold the chapters. However, I am unable to get all the elements in that div as a collection that I would then be able to iterate over.

Sub getZ()
Dim H As Object, C As New DataObject, stryn&, cptr%, html As New HTMLDocument, p As HTMLHtmlElement, para As Object, i&
Set H = CreateObject("WinHTTP.WinHTTPRequest.5.1")

Application.ScreenUpdating = False

With H
    .SetAutoLogonPolicy 0
    .SetTimeouts 0, 0, 0, 0
    .Open "GET", "https://www.gutenberg.org/files/8164/8164-h/8164-h.htm", False
    .Send
    .WaitForResponse
End With

html.body.innerHTML = H.ResponseText
Set para = html.getElementsByClassName("chapter").getElementsByTagName("*")

i = 1

For Each p In para
    Worksheets("Output").Range("A" & i & "") = p.innerText
    i = i + 1
Next

Application.ScreenUpdating = True
End Sub

I am getting an error with getElementsByTagName("*") as the object doesn't support that method.

Upvotes: 2

Views: 1654

Answers (2)

QHarr
QHarr

Reputation: 84465

Cleaner, and faster, would be to combine your requirements (all children of a class) using a css query, and then loop the returned nodeList e.g.

With html.querySelectorAll(".chapter > *")
    For i = 0 To .Length - 1
        Worksheets("Output").Range("A" & i + 1) = .Item(i).innerText
    Next
End With

Upvotes: 2

jacouh
jacouh

Reputation: 8741

Your code does not work as html.getElementsByClassName("chapter") gets an Object/DispHTMLElementCollection (like an array), it has not a method getElementsByTagName(). But an Object/HTMLDivElement has it. So this will work:

Option Explicit

Sub getZ()
Dim H As Object, C As New DataObject, stryn&, cptr%, html As New HTMLDocument, p As HTMLHtmlElement, para As Object, i&
Dim objChapters As Object, objChapter1 As Object

Set H = CreateObject("WinHTTP.WinHTTPRequest.5.1")

Application.ScreenUpdating = False

With H
    .SetAutoLogonPolicy 0
    .SetTimeouts 0, 0, 0, 0
    .Open "GET", "https://www.gutenberg.org/files/8164/8164-h/8164-h.htm", False
    .Send
    .WaitForResponse
End With

html.Body.innerHTML = H.responseText
'Set para = html.getElementsByClassName("chapter").getElementsByTagName("*")
Set objChapters = html.getElementsByClassName("chapter")

i = 1

For Each objChapter1 In objChapters
  Set para = objChapter1.getElementsByTagName("*")
  For Each p In para
    Worksheets("Output").Range("A" & i & "") = p.innerText
    i = i + 1
  Next
Next

Application.ScreenUpdating = True
'
Set objChapters = Nothing
Set objChapter1 = Nothing
Set para = Nothing
Set p = Nothing
Set html = Nothing
Set H = Nothing

End Sub

This gets all child elements of class 'chapter'.

Upvotes: 0

Related Questions