Reputation: 45
Edit: thank you for the solutions, guys.
The problem I have with the code below is the elements are not able to get the text under the div class "col-xs-12 col-sm-12 col-md-6 col-lg-5 col-md-pull-6 col-lg-pull-7 p-main-title-wrapper" from the webpage "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461" and havge it print on the worksheet in excel. The only text that needs to be extracted is "JAKE BAYLISS" and nothing else.
Sub Horse6()
Dim ws As Worksheet
Dim r As Integer
Dim c As Integer
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
Dim node As HTMLHtmlElement
Dim nodeDiv As HTMLHtmlElement
Set ws = ThisWorkbook.Worksheets("Sheet1")
r = 2
c = 12
With http
.Open "GET", "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461", False
.send
html.body.innerHTML = .responseText
End With
'Problems with the elements are here
With html.getElementsByClassName("np mainparent")
For Each node In html.getElementsByClassName("col-xs-12 col-sm-12 col-md-6 col-lg-5 col-md-pull-6 col-lg-pull-7 p-main-title-wrapper")
For Each nodeDiv In node.getElementsByTagName("div")
ws.Cells(r, c) = .Item(0).innerText
Next
Next
End With
MsgBox "Data input complete"
End Sub
Upvotes: 0
Views: 993
Reputation: 2267
There is also the method getElementsByTagName()
. The desired text is in an h1
tag, namely in the first one. The only line for direct access to it that you need is
html.getElementsByTagName("h1")(0).innertext
I have left out everything that is not needed for direct reading. You can see exactly that the whole loop construction is not necessary.
Sub Horse6()
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461", False
.send
html.body.innerHTML = .responseText
End With
MsgBox html.getElementsByTagName("h1")(0).innertext
End Sub
Upvotes: 1
Reputation: 57743
The code part you are speaking about looks like this
<div class="col-xs-12 col-sm-12 col-md-6 col-lg-5 col-md-pull-6 col-lg-pull-7 p-main-title-wrapper">
<h1 style="display:inline !important">JAKE BAYLISS</h1>
</div>
So you need to geth the <h1>
tag inside that and not a <div>
as you tried. See below:
Dim nodeH1 As HTMLHtmlElement
With html.getElementsByClassName("np mainparent")
For Each node In html.getElementsByClassName("col-xs-12 col-sm-12 col-md-6 col-lg-5 col-md-pull-6 col-lg-pull-7 p-main-title-wrapper")
For Each nodeH1 In node.getElementsByTagName("h1")
ws.Cells(r, c) = nodeH1.innerText
Next
Next
End With
Upvotes: 0
Reputation: 469
Have you tried selecting just the class 'p-main-title-wrapper' within your For Each selector? & also, rather than selecting the "div" in the second For Each try selecting the 'h1' element tag as it will be declared separately from the above elements.
Upvotes: 0