Ron Burgundy
Ron Burgundy

Reputation: 29

Scraping data from website to Excel using a macro...lost

I am totally new to this but here is my scope. I am running a macro to pull data from a business system. After this info is pulled, I want a macro to take certain fields, put them into a website form, click submit and then scrape and paste certain data results back into excel. Everything works minus the scraping and pasting back into excel.

Help please!

I have searched all over stack overflow and watched vids to try and figure out what I need to do but I must be misunderstanding something.

Sub Track()
Range("B2").Select

'This should call to PT and deliver tracking info

Dim IE As Object
Dim tbl As Object, td As Object



 Set IE = CreateObject("InternetExplorer.Application") 'Set IEapp = 
 InternetExplorer
 IE.Visible = True

      IE.Navigate "https://www.partstown.com/track-my-order"
      With IEapp
          Do
          DoEvents
          Loop Until IE.readyState = 4



'Input PO and zip
 Call IE.Document.getElementById("orderNo").SetAttribute("value", 
 "4500969111")
'ActiveCell.Offset(0, 2).Select
 Call IE.Document.getElementById("postalCode").SetAttribute("value", 
 "37040")
 IE.Document.forms(7).Submit

 Application.Wait Now + TimeValue("00:00:09")

'this is where i am stuck. I know this isnt right but tried to piece it 
 together
 Set elemCollection = IE.Document.getelElementsByTagname("table.account- 
 table details _tc_table_highlighted")

 For t = 0 To (elemCollection.Length - 1)
 For r = 0 To (elemCollection(t).Rows.Length - 1)
    For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
 ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = 
 elemCollection(t).Rows.Cells(c).innertext
 Next c
 Next r
 Next t

 End With


 End Sub

Here is what I want it to pull: Shipping column QTY ordered QTY shipped Product And to display in a linear fashion: Shipping, QTY ordered, QTY shipped, Product

Upvotes: 2

Views: 552

Answers (1)

QHarr
QHarr

Reputation: 84475

Internet Explorer:

I have made this a little more verbose than usual so you can see each step.

Key things:

1) proper page loads waits with While .Busy Or .readyState < 4: DoEvents: Wend

2) selecting elements by id where possible. The # is a css id selector. css selectors are applied by querySelector method of .document and retrieve the first element in the page which matches the specified pattern

3) a timed loop is needed to wait for results to be present

4) the order qty etc info is a newline divided string. It seemed easiest to split on these newlines and then access individual items from the resultant array by index

5) I order, per your specification, the results in an array and write that array out in one go to the sheet

6) The "." is a class selector in .order-history__item-descript--min i.e. return the first element with class of order-history__item-descript--min

7) The [x=y] is an attribute = value selector in [data-label=Shipping] i.e. return the first element with data-label attribute having value Shipping

8) The combination of .details-table a is using a descendant combinator, " ", to specify I want a tag elements that have a parent with class .details-table

VBA:

Option Explicit

'VBE > Tools > References:
' Microsoft Internet Controls
Public Sub RetrieveInfo()
    Dim ie As InternetExplorer, ele As Object, t As Date
    Const MAX_WAIT_SEC As Long = 5

    Set ie = New InternetExplorer

    With ie
        .Visible = True
        .Navigate2 "https://www.partstown.com/track-my-order"

        While .Busy Or .readyState < 4: DoEvents: Wend

        With .document
            .querySelector("#orderNo").Value = "4500969111"
            .querySelector("#postalCode").Value = "37040"
            .querySelector("#orderLookUpForm").submit  
        End With

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim shipping As String, order As String, items() As String
        With .document
            t = Timer
            Do
                On Error Resume Next
                Set ele = .querySelector("[data-label=Shipping]")
                On Error GoTo 0
                If Timer - t > MAX_WAIT_SEC Then Exit Do
            Loop While ele Is Nothing

            If ele Is Nothing Then Exit Sub

            shipping = ele.innerText
            order = .querySelector(".order-history__item-descript--min").innerText
            items = Split(order, vbNewLine)

            Dim qtyOrdered As Long, qtyShipped As String, product As String

            qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
            qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
            product = .querySelector(".details-table a").Title

            Dim results()
            results = Array(shipping, qtyOrdered, qtyShipped, product)
            ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results

        End With
        .Quit
    End With
End Sub

If new to HTML please look at:

https://developer.mozilla.org/en-US/docs/Web/HTML

If new to css selectors please look at:

https://flukeout.github.io/


XMLHTTP:

The whole thing can also be done with XHR. This is much faster than opening a browser.

XHR:

Use XMLHttpRequest (XHR) objects to interact with servers. You can retrieve data from a URL without having to do a full page [render]

In this case I do an initial GET request to the landing page to retrieve the CSRFToken to use in my re-enactment of the POST request the page makes to the server when you manually input data and press submit. You get the data you want in the server response. I pass a query string in the body of the POST send line .send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft ; you can see your parameters there.

Option Explicit
Public Sub GetInfo()
    Dim html As HTMLDocument, csrft As String  '<  VBE > Tools > References > Microsoft HTML Object Library
    Set html = New HTMLDocument

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.partstown.com", False
        .send

        html.body.innerHTML = .responseText

        csrft = html.querySelector("[name=CSRFToken]").Value

        .Open "POST", "https://www.partstown.com/track-my-order", False
        .setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
        .setRequestHeader "Accept-Encoding", "gzip, deflate"
        .setRequestHeader "Accept-Language", "en-US,en;q=0.9"
        .send "orderNo=4500969111&postalCode=37040&CSRFToken=" & csrft

        html.body.innerHTML = .responseText
    End With

    Dim shipping As String, order As String, items() As String

    shipping = html.querySelector("[data-label=Shipping]").innerText
    order = html.querySelector(".order-history__item-descript--min").innerText
    items = Split(order, vbNewLine)

    Dim qtyOrdered As Long, qtyShipped As String, product As String

    qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
    qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
    product = html.querySelector(".details-table a").Title

    Dim results()
    results = Array(shipping, qtyOrdered, qtyShipped, product)
    ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).Resize(1, UBound(results) + 1) = results
End Sub

Example of loop:

Option Explicit

Public Sub GetInfo()
    Dim html As HTMLDocument, csrft As String, lastRow As Long, sourceValues() '<  VBE > Tools > References > Microsoft HTML Object Library
    Set html = New HTMLDocument
    Dim ws As Worksheet, i As Long
    Set ws = ThisWorkbook.Worksheets("Sheet4")
    lastRow = ws.Cells(ws.rows.Count, "B").End(xlUp).Row
    sourceValues = ws.Range("B2:D" & lastRow).Value
    Dim results()
    ReDim results(1 To UBound(sourceValues, 1), 1 To 4)
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.partstown.com", False
        .send
        html.body.innerHTML = .responseText

        csrft = html.querySelector("[name=CSRFToken]").Value
        Stop
        For i = LBound(sourceValues, 1) To UBound(sourceValues, 1)
            If sourceValues(i, 1) <> vbNullString And sourceValues(i, 3) <> vbNullString Then
                DoEvents
                .Open "POST", "https://www.partstown.com/track-my-order", False
                .setRequestHeader "Referer", "https://www.partstown.com/track-my-order"
                .setRequestHeader "User-Agent", "Mozilla/5.0"
                .setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
                .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
                .setRequestHeader "Accept-Encoding", "gzip, deflate"
                .setRequestHeader "Accept-Language", "en-US,en;q=0.9"
                .send "orderNo=" & sourceValues(i, 1) & "&postalCode=" & sourceValues(i, 3) & "&CSRFToken=" & csrft

                html.body.innerHTML = .responseText

                Dim shipping As String, order As String, items() As String

                shipping = html.querySelector("[data-label=Shipping]").innerText
                order = html.querySelector(".order-history__item-descript--min").innerText
                items = Split(order, vbNewLine)

                Dim qtyOrdered As Long, qtyShipped As String, product As String

                qtyOrdered = CLng(Replace$(items(0), "Qty Ordered: ", vbNullString))
                qtyShipped = CLng(Replace$(items(1), "Qty Shipped: ", vbNullString))
                product = html.querySelector(".details-table a").Title

                results(i, 1) = shipping
                results(i, 2) = qtyOrdered
                results(i, 3) = qtyShipped
                results(i, 4) = product
            End If
            'Application.Wait Now + TimeSerial(0, 0, 1)
        Next
    End With
    'results written out from row 2 column E
    ws.Cells(2, 5).Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub

Upvotes: 2

Related Questions