NewGuy1
NewGuy1

Reputation: 45

Getting text from a particular div class on a webpage HTML using VBA in excel

Edit: thank you for the solutions, guys.

The problem I have with the code below is the elements are not able to get the text under the div class "col-xs-12 col-sm-12 col-md-6 col-lg-5 col-md-pull-6 col-lg-pull-7 p-main-title-wrapper" from the webpage "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461" and havge it print on the worksheet in excel. The only text that needs to be extracted is "JAKE BAYLISS" and nothing else.

Sub Horse6()

Dim ws As Worksheet
Dim r As Integer
Dim c As Integer
Dim http As New XMLHTTP60
Dim html As New HTMLDocument
Dim node As HTMLHtmlElement
Dim nodeDiv As HTMLHtmlElement

  Set ws = ThisWorkbook.Worksheets("Sheet1")
  r = 2
  c = 12
  
    With http
    .Open "GET", "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461", False
    .send
    html.body.innerHTML = .responseText
    End With
  
'Problems with the elements are here

    With html.getElementsByClassName("np mainparent")
        For Each node In html.getElementsByClassName("col-xs-12 col-sm-12 col-md-6 col-lg-5  col-md-pull-6 col-lg-pull-7  p-main-title-wrapper")
            For Each nodeDiv In node.getElementsByTagName("div")
              ws.Cells(r, c) = .Item(0).innerText
            Next
        Next
    End With
  
    MsgBox "Data input complete"

End Sub

Upvotes: 0

Views: 993

Answers (3)

Zwenn
Zwenn

Reputation: 2267

There is also the method getElementsByTagName(). The desired text is in an h1 tag, namely in the first one. The only line for direct access to it that you need is
html.getElementsByTagName("h1")(0).innertext

I have left out everything that is not needed for direct reading. You can see exactly that the whole loop construction is not necessary.

Sub Horse6()

Dim http As New XMLHTTP60
Dim html As New HTMLDocument
  
  With http
    .Open "GET", "https://www.racingandsports.com/thoroughbred/jockey/jake-bayliss/27461", False
    .send
    html.body.innerHTML = .responseText
  End With
  
  MsgBox html.getElementsByTagName("h1")(0).innertext
End Sub

Upvotes: 1

Pᴇʜ
Pᴇʜ

Reputation: 57743

The code part you are speaking about looks like this

<div class="col-xs-12 col-sm-12 col-md-6 col-lg-5 col-md-pull-6 col-lg-pull-7 p-main-title-wrapper">
    <h1 style="display:inline !important">JAKE BAYLISS</h1>
</div>

So you need to geth the <h1> tag inside that and not a <div> as you tried. See below:

Dim nodeH1 As HTMLHtmlElement
With html.getElementsByClassName("np mainparent")
    For Each node In html.getElementsByClassName("col-xs-12 col-sm-12 col-md-6 col-lg-5  col-md-pull-6 col-lg-pull-7  p-main-title-wrapper")
        For Each nodeH1 In node.getElementsByTagName("h1")
          ws.Cells(r, c) = nodeH1.innerText
        Next
    Next
End With

Upvotes: 0

JCrook
JCrook

Reputation: 469

Have you tried selecting just the class 'p-main-title-wrapper' within your For Each selector? & also, rather than selecting the "div" in the second For Each try selecting the 'h1' element tag as it will be declared separately from the above elements.

Upvotes: 0

Related Questions