talbert.houle
talbert.houle

Reputation: 215

VBA Scrape Content Generated by JavaScript

I am looking to grab the Recommended Customer Pricing information from a URL that is defined in an Excel worksheet using VBA. Those values are in Excel in Cells(i,11), which all point to a specific page on https://ark.intel.com. The values begin on row 5.

For example, if I want to find the price for the Intel Xeon 8268 I would navigate to https://ark.intel.com/content/www/us/en/ark/products/192481/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html. If viewing source, it is obvious this content is generated with JavaScript, so I instead use "Inspect Element" option on Firefox web browser.

From here, I can navigate down and find what I am looking for in the tag. See image below:

Recommendd Customer Price for Intel Xeon 8268

I am unable to capture that value and write it to an excel column, which would be Column E. Below is one attempt I have made:

Sub ProcessorPricing()
    Dim URL As String, lastRow As Long
    Dim XMLHTTP As Object, HTML As Object, objResult As Object, Price As Object

    lastRow = Range("A" & Rows.Count).End(xlUp).row

    Dim cookie As String
    Dim result_cookie As String

    For i = 5 To lastRow

        If Cells(i, 1) <> "" Then

            URL = Cells(i, 11)

            Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
            XMLHTTP.Open "GET", URL, False
            XMLHTTP.setRequestHeader "Content-Type", "text/xml"
            XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
            XMLHTTP.send

            Set HTML = CreateObject("htmlfile")
            HTML.body.innerHTML = XMLHTTP.responseText

            Set objResult = html.getElementsByID("bladeInside")
            Set Price = objResult.getElementsByTagName("span")(0)

            Cells(i, 5) = Price.Value
            DoEvents
        End If
    Next
End Sub

Any help would be greatly appreciated.

PS - I have also tried the code found at https://www.myonlinetraininghub.com/web-scraping-with-vba to no avail either

UPDATE:

Was able to get everything working with your help. Thank you, Bertrand Martel and Stavros Jon.

Here is the entire script:

Sub UpdateProcessorInfo()
'requirements:  JSON Parser installation needs to be added to project - https://github.com/VBA-tools/VBA-JSON - (Download latest release -> Import JsonConverter.bas -> File -> Import File)
'requirements:  Windows only, include Reference to "Microsoft Scripting Runtime" (Tools -> References -> Check Microsoft Scripting Runtime)
'requirements:  Add a refernce to Microsoft WinHTTP Services 5.1.  (Tools -> References -> Check Microsoft WinHTTP Services 5.1)

Dim Connection As WorkbookConnection
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, link As Object
Dim cookie As String
Dim result_cookie As String
Dim req As New WinHttpRequest
Dim ids As String
Dim responseJSON As Object

For Each Connection In ThisWorkbook.Connections
    Connection.Refresh
Next Connection

Worksheets("Processor_DB_Intel").Range("A2:A1000").Copy
Worksheets("Processor Comparisons").Range("A5").PasteSpecial Paste:=xlPasteValues

lastRow = Range("A" & Rows.Count).End(xlUp).row

Range("k5:k300").Clear

For i = 5 To lastRow

    If Cells(i, 1) <> "" Then

        url = "https://www.google.com/search?q=" & "site:ark.intel.com " & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)

        Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
        XMLHTTP.Open "GET", url, False
        XMLHTTP.setRequestHeader "Content-Type", "text/xml"
        XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
        XMLHTTP.send

        Set html = CreateObject("htmlfile")
        html.body.innerHTML = XMLHTTP.responseText
        Set objResultDiv = html.getElementById("rso")
        Set link = objResultDiv.getElementsByTagName("a")(0)

        Cells(i, 11) = link
        DoEvents
    End If

Next

lastRow = Range("A" & Rows.Count).End(xlUp).row

For i = 5 To lastRow

    ids = Cells(i, 13)
    url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"

    If Cells(i, 1) <> "" Then

        With req
            .Open "GET", url, False
            .send
            Set responseJSON = JsonConverter.ParseJson(.responseText)
        End With

        On Error Resume Next

        'Debug.Print responseJSON(1)("displayPrice")
        Cells(i, 14) = responseJSON(1)("displayPrice")

    End If
Next

End Sub

Upvotes: 2

Views: 1629

Answers (2)

Stavros Jon
Stavros Jon

Reputation: 1697

AS @Bertrand Martel pointed out, there's a very convenient API you can use to grab the info you need.

To further elaborate on his answer and since you're having trouble extracting the price from the JSON response, here's my two cents.

You'll need to add this JSON parser to your project. Follow the installation instructions in the link.

The response's structure looks like this:

enter image description here

So it all comes down to this:

Option Explicit

Sub intel()
Dim req As New WinHttpRequest 'add a reference to Microsoft WinHTTP Services 5.1. MSXML2 works fine as well
Dim url As String, ids As String
Dim responseJSON As Object
ids = "192481"
url = "https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=" & ids & "&siteName=ark"
With req
    .Open "GET", url, False
    .send
    Set responseJSON = JsonConverter.ParseJson(.responseText)
End With
Debug.Print responseJSON(1)("displayPrice") 'For demonstration purposes the price is printed in the immediate window
End Sub

Upvotes: 4

Bertrand Martel
Bertrand Martel

Reputation: 45352

As you have noticed the data is not embedded in the html but loaded via Javascript using an external JSON API:

https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&mmids=999C0G&siteName=ark

This URL is constructed using the product ID 192481 and the siteName ark. Droping the mmids returns only the product which should be sufficient (unless you need the orderingCode ?):

https://ark.intel.com/libs/apps/intel/support/ark/recommendedCustomerPrice?ids=192481&siteName=ark

The idea is that you extract the product ID from your original URL :

https://ark.intel.com/content/www/us/en/ark/products/[PRODUCT_ID_HERE]/intel-xeon-platinum-8268-processor-35-75m-cache-2-90-ghz.html.

and call this API instead

Upvotes: 1

Related Questions