JOAO R
JOAO R

Reputation: 11

Webscraping of product prices and specs

I'm new to the VBA world and i was trying to create a Excel file to get the cheapest price and name tag on a website. I've created one file that goes through the whole search list and gathers each product URL and places on a spreadsheet. My challenge now is in making this second code work. It starts out getting the information without any issues but then, after 10 or more URLs the macro gives a bug and it starts repeating the information on all the following cells.. Is there a way i can make the code run slower so it doesn't get into this problem?

I'll list here my code and a sample of the URLs im scraping..

here's the code i've been using:

Sub test()
Dim URL As String

Set ie = CreateObject("internetexplorer.application")

For i = 2 To 300

URL = Cells(i, 1).Value

ie.navigate URL
ie.Visible = False

Do While ie.busy And ie.readystate <> "readystate_complete"
DoEvents
Loop

Cells(i, 3) = ie.document.getElementsByTagName("h1")(0).innerText
Cells(i, 4) = ie.document.getElementsByTagName("strong")(0).innerText
Next i

ie.Quit

MsgBox "acabou"

End Sub

<https://www.zoom.com.br/notebook/notebook-vaio-vjf157f11x-b0211s-intel-core-i5-8250u-15-6-8gb-ssd-256-gb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-samsung-chromebook-plus-intel-celeron-3965y-12-2-4gb-emmc-32-gb-chrome-os-touchscreen>
<https://www.zoom.com.br/notebook/notebook-dell-xps-7390-intel-core-i7-10710u-13-3-16gb-ssd-512-gb-windows-10-touchscreen>
<https://www.zoom.com.br/notebook/notebook-dell-i15-3583-a5-intel-core-i7-8565u-15-6-8gb-hd-2-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i5-8250u-15-6-4gb-hd-1-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-dell-i15-7580-a20-intel-core-i7-8550u-15-6-8gb-hd-1-tb-geforce-mx150-windows-10>
<https://www.zoom.com.br/notebook/notebook-dell-i14-3480-u30-intel-core-i5-8265u-14-4gb-hd-1-tb-linux-8-geracao>
<https://www.zoom.com.br/notebook/macbook-pro-apple-muhn2bz-intel-core-i5-13-3-8gb-ssd-128-gb-tela-de-retina>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc150-amd-a4-9120-14-2gb-emmc-32-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-samsung-np930qaa-kw1br-intel-core-i7-8550u-13-3-8gb-ssd-256-gb-windows-10-touchscreen>
<https://www.zoom.com.br/notebook/notebook-acer-a515-51g-58vh-intel-core-i5-7200u-15-6-8gb-hd-1-tb-geforce-940mx>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc222-intel-celeron-dual-core-13-3-4gb-emmc-64-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-pt515-51-788a-intel-core-i7-9750h-15-6-32gb-ssd-1-tb-geforce-rtx-2070-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-a315-53-53ak-intel-core-i5-7200u-15-6-4gb-hd-1-tb-windows-10-7-geracao>
<https://www.zoom.com.br/notebook/notebook-dell-i15-5584-m40-intel-core-i7-8565u-15-6-8gb-hd-2-tb-geforce-mx130-windows-10>
<https://www.zoom.com.br/notebook/notebook-acer-a315-41g-r21b-amd-ryzen-5-2500u-15-6-8gb-hd-1-tb-radeon-535-windows-10>
<https://www.zoom.com.br/notebook/notebook-positivo-master-n2140-intel-core-i3-7020u-14-4gb-hd-500-gb-windows-10-7-geracao>
<https://www.zoom.com.br/notebook/notebook-multilaser-pc101-intel-atom-14-1gb-ssd-32-gb-windows-10>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i5-8250u-15-6-8gb-hd-1-tb-windows-10-8-geracao>
<https://www.zoom.com.br/notebook/notebook-acer-an515-51-77fh-intel-core-i7-7700hq-15-6-8gb-hd-1-tb-geforce-gtx-1050-windows-10>
<https://www.zoom.com.br/notebook/notebook-dell-i15-3583-a2yp-intel-core-i5-8265u-15-6-4gb-optane-16-gb-hd-1-tb-windows-10>
<https://www.zoom.com.br/notebook/notebook-asus-g531gt-intel-core-i7-9750h-15-6-16gb-ssd-512-gb-geforce-gtx-1650-windows-10>
<https://www.zoom.com.br/notebook/notebook-vaio-fit-15s-intel-core-i3-7100u-15-6-4gb-hd-1-tb-windows-10-home>
<https://www.zoom.com.br/notebook/notebook-samsung-s50-intel-core-i7-7500u-13-3-8gb-ssd-256-gb-windows-10-style>
<https://www.zoom.com.br/notebook/notebook-lenovo-b330-intel-core-i3-7020u-15-6-4gb-ssd-120-gb-windows-10-7-geracao>

Upvotes: 1

Views: 81

Answers (3)

Mehdi Naciri
Mehdi Naciri

Reputation: 1

In python I will do :

  • You should first import library

from time import sleep

Two functions available :

  • this function let you sleep before execute every 3 seconds

      time.sleep(3)
    
  • this function let you random sleep between every execute from 1 to 3 seconds

      sleep(randint(1, 3)
    

Note: take aware about :

  • cookies because sometimes you need to post request to scrape some id before get.
  • the syntax of your header and set correctly origin & referrer parameters.

Upvotes: 0

YasserKhalil
YasserKhalil

Reputation: 9568

Try this

Sub GetPrices()
    Dim html As MSHTML.HTMLDocument, r As Long
    For r = 1 To 4
        Set html = GetHTML(Cells(r, 1).Value)
        Cells(r, 3).Value = html.querySelector("h1.product-name").innerText
        Cells(r, 4).Value = Replace(Replace(html.querySelector(".product-price").innerText, "a partir de  ", vbNullString), ":( ", "")
        Set html = Nothing
    Next r
End Sub

Function GetHTML(ByVal sURL As String) As HTMLDocument
    Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument
    Set http = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument
    With http
        .Open "Get", sURL, False
        .send
        html.body.innerHTML = .responseText
    End With
    Set GetHTML = html
End Function

Upvotes: 1

Zwenn
Zwenn

Reputation: 2267

First of all:
Always declare all variables. To force this, always write Option Explicit as the first line in each module. This way, especially typos in variable names are immediately detected by the compiler.

Here is what to do about your problem:
IE is sometimes a real diva. For example, it doesn't like to have to process URLs in the same instance in quick succession. Therefore it is advisable to kick it out of memory and restart it for each new URL.

To restart it quickly, the deletion of coockies, the cache, etc. must not be set in its settings. Otherwise automation errors will occur.

Try this macro. With the given URLs it works:

Option Explicit

Sub test()

Dim URL As String
Dim ie As Object
Dim i As Long

  For i = 2 To 300

    If i > 14 Then
      ActiveWindow.SmallScroll down:=1
    End If

    URL = ActiveSheet.Cells(i, 1).Value

    Set ie = CreateObject("internetexplorer.application")
    ie.navigate URL
    ie.Visible = False
    Do While ie.readystate <> 4: DoEvents: Loop

    ActiveSheet.Cells(i, 3) = ie.document.getElementsByTagName("h1")(0).innerText
    ActiveSheet.Cells(i, 4) = ie.document.getElementsByTagName("strong")(0).innerText

    ie.Quit
    Set ie = Nothing
  Next i

  MsgBox "acabou"
End Sub

Upvotes: 2

Related Questions