Reputation: 95
I have written a macro to scrape product information from a retailer's webpage. It runs fine but does not render any results in my worksheet. I am having a hard time understanding why. I enter "sale" into the search inputbox, leading to the following url:
http://www.shopjustice.com/search/?q=sale&originPageName=home
I want the product's name, former price, and current price in my worksheet. The HTML for these elements is as follows:
<div class="subCatName">
<a href="/girls-clothing/colored-jeggings/6611358/651?pageSort=W3sidHlwZSI6InJlbGV2YW5jZSIsInZhbCI6IiJ9XQ==&productOrigin=search%20page&productGridPlacement=1-1" id="anchor2_6611358" class="auxSubmit">Colored Jeggings</a>
</div>
<div class="cat-list-price subCatPrice">
<div class="priceContainer">
<span class="mobile-was-price">
was
$26.90</span>
<span class="mobile-now-price">
now
$10.49</span>
</div>
<div class="price_description">
<span class="mobile-extra">
Extra 30% off clearance!</span>
</div>
</div>
The code is as follows:
Sub test2()
Dim RowCount, erow As Long
Dim sht As Object
Dim ele As IHTMLElement
Dim eles As IHTMLElementCollection
Dim doc As HTMLDocument
Set sht = Sheets("JUSTICESALE")
RowCount = 1
sht.Range("A" & RowCount) = "Clothing Item"
sht.Range("B" & RowCount) = "SKU"
sht.Range("C" & RowCount) = "Former Price"
sht.Range("D" & RowCount) = "Sale Price"
Set ie = CreateObject("InternetExplorer.application")
searchterm = InputBox("ENTER SEARCH TERM")
Application.StatusBar = "LOADING JUSTICE SEARCH"
With ie
.Visible = True
.navigate "http://www.shopjustice.com/"
Do While .busy Or _
.readystate <> 4
DoEvents
Loop
Set doc = ie.document
doc.getelementsbyname("q").Item.innertext = searchterm
doc.getElementsByClassName("searchbtn").Item.Click
Application.StatusBar = "EXTRACTING PRODUCT DATA"
Set eles = doc.getElementsByClassName("subCatName")
For Each ele In eles
If ele.className = "subCatName" Then
erow = sht.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1) = doc.getElementsByClassName("auxSubmit")(RowCount).innertext
Cells(erow, 2) = doc.getElementsByClassName("mobile-was-price")(RowCount).innertext
RowCount = RowCount + 1
End If
Next ele
End With
Set ie = Nothing
Application.StatusBar = ""
End Sub
Any help would be much appreciated.
EDIT: Hi Peter, I appreciate your insight. It has certainly pre-empted some problems. However, after adding the below code prior to the edited-to-account-for-missing classname loop, it is still not writing to excel.
Do While ie.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
What am I missing?
I have also rendered an alternative method for a different retailer's webpage, albeit the same concept, as shown below. What are your thoughts on this method? My only issue is a Permission Denied Error 70 at the Select Case line.
Sub test5()
Dim erow As Long
Dim ele As Object
Set sht = Sheets("CARTERS")
RowCount = 1
sht.Range("A" & RowCount) = "Clothing Item"
sht.Range("B" & RowCount) = "SKU"
sht.Range("C" & RowCount) = "Former Price"
sht.Range("D" & RowCount) = "Sale Price"
erow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Set objIE = CreateObject("Internetexplorer.application")
searchterm = InputBox("ENTER CARTER'S SEARCH TERM")
With objIE
.Visible = True
.navigate "http://www.carters.com/"
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
.document.getElementsByName("q").Item.innerText = searchterm
.document.getElementsByClassName("btn_search").Item.Click
Do While .readyState <> READYSTATE_COMPLETE
DoEvents
Loop
For Each ele In .document.all
Select Case ele.className
Case “product - name”
RowCount = RowCount + 1
sht.Range("A" & RowCount) = ele.innerText
Case “product - standard - price”
sht.Range("B" & RowCount) = ele.innerText
Case "product-sales-price"
sht.Range("C" & RowCount) = ele.innerText
End Select
Next ele
End With
Set objIE = Nothing
End Sub
And once again, thank you for your help.
Upvotes: 2
Views: 1887
Reputation: 8557
Your code is working just fine, with two caveats...
First, after you "click" the search button on the main page, your code is not waiting for the results page to load. Therefore your loop that looks for each item fails because there's nothing there (yet).
Second, you need some error handling when you're parsing the HTML for certain fields to handle the case when those fields are missing. As an example, look at the code here and apply it to your situation:
For Each ele In eles
If ele.className = "subCatName" Then
erow = sht.Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
On Error Resume Next
Cells(erow, 1) = doc.getElementsByClassName("auxSubmit")(RowCount).innerText
If Err.Number <> 0 Then
Cells(erow, 1) = "ERR: 'auxSubmit' Class Name Not Found!"
Err.Clear
Else
End If
Cells(erow, 2) = doc.getElementsByClassName("mobile-was-price")(RowCount).innerText
If Err.Number <> 0 Then
Cells(erow, 2) = "ERR: 'mobile-was-price' Class Name Not Found!"
Err.Clear
End If
On Error GoTo 0
RowCount = RowCount + 1
End If
Next ele
Upvotes: 1