Reputation: 11
I'm having some trouble programming a VBA code to scrape div class data off a site and drop into excel. I cannot post the URL due to privacy concerns (patient data) but the code is posted below:
<div id="location-1" class="Location">
<div class="grid">
<div class="row">
<div class="info">
<div class="column">
<div class="element-1">[text]</div>
<div class="element-2">[text]</div>
<p class="element-3"></p>
<p class="element-4">[text]</p>
<p class="element-5"></p>
<p class="element-6">[text]</p>
<div class="dir">
<a href="[link]" class="dir" target="_blank">Get dir</a>
</div>
</div>
</div>
</div>
</div>
</div>
My code is posted below. I am trying to scrape the info from "Element-1" and "Element-2" into 1 row for each source. Any help here would be much appreciated!
Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim source As Object
With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With
For Each source In html.getElementsByClassName("column")
x = x + 1: Cells(x, 1) = source.getAttribute("element-1")
Cells(x, 2) = source.getAttribute("element-2")
Next source
End Sub
Upvotes: 1
Views: 1612
Reputation: 2267
Here are two different solutions. (Not tested)
The first one:
Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim nodeColumnElements As Object
Dim currentRow As Long
currentRow = x 'Here your start row
With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With
Set nodeColumnElements = html.getElementsByClassName("column")(0).getElementsByTagName("div")
Cells(currentRow, 1) = Trim(nodeColumnElements(0).innertext)
currentRow = currentRow + 1
Cells(currentRow, 2) = Trim(nodeColumnElements(1).innertext)
End Sub
The second one grabs both elements directly:
Sub webscrape()
Dim http As New XMLHTTP60
Dim html As New HTMLdocument
Dim currentRow As Long
currentRow = x 'Here your start row
With http
.Open "get", "[link]", False
.send
html.body.innerHTML = .responseText
End With
Cells(currentRow, 1) = Trim(html.getElementsByClassName("element-1")(0).innertext)
currentRow = currentRow + 1
Cells(currentRow, 2) = Trim(html.getElementsByClassName("element-2")(0).innertext)
End Sub
Upvotes: 1