webscrapingtester
webscrapingtester

Reputation: 11

Web scraping DIV class in VBA

I'm having some trouble programming a VBA code to scrape div class data off a site and drop into excel. I cannot post the URL due to privacy concerns (patient data) but the code is posted below:

<div id="location-1" class="Location">
    <div class="grid">
      <div class="row">
          <div class="info">
            <div class="column">
              <div class="element-1">[text]</div>
              <div class="element-2">[text]</div>
              <p class="element-3"></p>
              <p class="element-4">[text]</p>
              <p class="element-5"></p>
              <p class="element-6">[text]</p>
              <div class="dir">
                    <a href="[link]" class="dir" target="_blank">Get dir</a>
                  </div> 
              </div>
          </div>
      </div>
    </div>
  </div>

My code is posted below. I am trying to scrape the info from "Element-1" and "Element-2" into 1 row for each source. Any help here would be much appreciated!

Sub webscrape()
    Dim http As New XMLHTTP60
    Dim html As New HTMLdocument
    Dim source As Object
    
    With http
    .Open "get", "[link]", False
    .send
    html.body.innerHTML = .responseText
    End With
    
    For Each source In html.getElementsByClassName("column")
    x = x + 1: Cells(x, 1) = source.getAttribute("element-1")
    Cells(x, 2) = source.getAttribute("element-2")
    Next source
    
End Sub

Upvotes: 1

Views: 1612

Answers (1)

Zwenn
Zwenn

Reputation: 2267

Here are two different solutions. (Not tested)

The first one:

Sub webscrape()
  Dim http As New XMLHTTP60
  Dim html As New HTMLdocument
  Dim nodeColumnElements As Object
  Dim currentRow As Long
  
  currentRow = x 'Here your start row
  
  With http
    .Open "get", "[link]", False
    .send
    html.body.innerHTML = .responseText
  End With
  
  Set nodeColumnElements = html.getElementsByClassName("column")(0).getElementsByTagName("div")
  Cells(currentRow, 1) = Trim(nodeColumnElements(0).innertext)
  currentRow = currentRow + 1
  Cells(currentRow, 2) = Trim(nodeColumnElements(1).innertext)
End Sub

The second one grabs both elements directly:

Sub webscrape()
  Dim http As New XMLHTTP60
  Dim html As New HTMLdocument
  Dim currentRow As Long
  
  currentRow = x 'Here your start row
  
  With http
    .Open "get", "[link]", False
    .send
    html.body.innerHTML = .responseText
  End With
  
  Cells(currentRow, 1) = Trim(html.getElementsByClassName("element-1")(0).innertext)
  currentRow = currentRow + 1
  Cells(currentRow, 2) = Trim(html.getElementsByClassName("element-2")(0).innertext)
End Sub

Upvotes: 1

Related Questions