saravana
saravana

Reputation: 39

How to Select Element That Does Not have Specific Class name using query selector?

I am doing VBA web scraping for tracking the status of fedex shipment.

The URL is
https://www.fedex.com/apps/fedextrack/index.html?tracknumbers=475762806100&cntry_code=in

The current status of this shipment is delivered. I want to extract this.

I know how to select elements that have a class name.

The above website's HTML code does not have class names.

How do I select an element that does not have a specific class name using queryselector?

I am using the fastest method which is MSXML2.XMLHTTP in my VBA code. One drawback of this method is getelementbyclassname will not work in this method or any other way to make the program to sense the getelementbyclassname.

This is the reason I chose to use queryselector.

I am not able to pick the correct element using query selector.

enter image description here

From the HTML codes, How do I get the inner text of the class named ----- "redesignStatusChevronTVC tank-results-item__data-label-large tank-text-center statusChevron_key_status"?

<h3 class="redesignStatusChevronTVC tank-results-item__data-label-large tank-text-center statusChevron_key_status">Delivered</h3>
Sub GetInfo()
    Dim sResponse As String, i As Long, html As New HTMLDocument
    Dim prices As Object, info As Object
    Application.ScreenUpdating = False

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.fedex.com/apps/fedextrack/index.html?tracknumbers=475762806100&cntry_code=in", False
        .send
        sResponse = .responseText
    End With

    With html
        .body.innerHTML = sResponse
        Set info = .querySelectorAll("redesignStatusChevronTVC tank-results-item__data-label-large tank-text-center statusChevron_key_status")
    End With

    With Worksheets(3)
        Worksheets(3).Activate '
        For i = 0 To info.Length - 1
           Debug.Print info(i).innerText
        Next i
    End With
    Application.ScreenUpdating = True
End Sub

Upvotes: 0

Views: 160

Answers (1)

chillin
chillin

Reputation: 4486

Assuming I'm reading things correctly on my end, I think the HTML returned by the server does not contain the information you're looking for.

To confirm this, try printing:

InStr(1, sResponse, "redesignStatusChevronTVC tank-results-item__data-label-large tank-text-center statusChevron_key_status", vbTextCompare)

to the immediate window and you should see it return 0 (meaning that text is not present within the response text).

The information you're trying to scrape (and even the element which needs selecting with .redesignStatusChevronTVC.tank-results-item__data-label-large.tank-text-center.statusChevron_key_status) is populated dynamically via JavaScript and does not exist at the time you're trying to access it.


From what I can see, the web page makes an HTTP POST request, and the server returns some JSON, which represents information relating to that tracking number. Try the code below (run the procedure JustATest), which tries to make the same HTTP POST request:

Option Explicit

Private Sub JustATest()
    MsgBox "Delivery status is: " & GetDeliveryStatusForPackage("475762806100", "en_IN")
End Sub


Private Function GetDeliveryStatusForPackage(ByVal trackingNumber As String, ByVal localeValue As String)
    ' Given a "trackingNumber" and "localeValue", should return the delivery status of that package.
    Dim jsonResponse As String
    jsonResponse = GetFedExJson(trackingNumber, localeValue)

    GetDeliveryStatusForPackage = ExtractDeliveryStatusFromJson(jsonResponse)
End Function


Private Function ExtractDeliveryStatusFromJson(ByVal someJson As String) As String
    ' Should extract the delivery status. This function treats the JSON
    ' encoded string as a string and hence relies on basic string matching.
    Const START_DELIMITER As String = """keyStatus"":"""

    Dim startDelimiterIndex As Long
    startDelimiterIndex = InStr(1, someJson, START_DELIMITER)
    Debug.Assert startDelimiterIndex > 0
    startDelimiterIndex = startDelimiterIndex + Len(START_DELIMITER)

    Dim endDelimiterIndex As Long
    endDelimiterIndex = InStr(startDelimiterIndex + 1, someJson, """", vbBinaryCompare)
    Debug.Assert endDelimiterIndex > 0

    ExtractDeliveryStatusFromJson = Mid$(someJson, startDelimiterIndex, endDelimiterIndex - startDelimiterIndex)
End Function


Private Function GetFedExJson(ByVal trackingNumber As String, ByVal localeValue As String) As String
    ' Should return a JSON-encoded response. The return value can be
    ' passed to a function that parses JSON (if such a function is available for use).
    Dim formToPost As String
    formToPost = CreateFedExForm(trackingNumber, localeValue)

    Const TARGET_URL As String = "https://www.fedex.com/trackingCal/track"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "POST", TARGET_URL, False
        .SetRequestHeader "Connection", "keep-alive"
        .SetRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"
        .Send formToPost
        Debug.Assert InStr(1, .ResponseText, "{""TrackPackagesResponse"":{""successful"":true,", vbBinaryCompare)
        GetFedExJson = .ResponseText
    End With
End Function


Private Function CreateFedExForm(ByVal trackingNumber As String, ByVal localeValue As String) As String
    ' Should return a string representing a form of URL encoded name-value pairs.
    Dim data As String
    data = "{""TrackPackagesRequest"":{""appType"":""WTRK"",""appDeviceType"":""DESKTOP"",""supportHTML"":true,""supportCurrentLocation"":true,""uniqueKey"":"""",""processingParameters"":{},""trackingInfoList"":[{""trackNumberInfo"":{""trackingNumber"":""" & trackingNumber & """,""trackingQualifier"":"""",""trackingCarrier"":""""}}]}}"
    CreateFedExForm = "data=" & Application.EncodeURL(data) & "&action=trackpackages&locale=" & Application.EncodeURL(localeValue) & "&version=1&format=json"
End Function

  • If it works, then the function GetDeliveryStatusForPackage seems capable of returning the delivery status of a given trackingNumber and localeValue.
  • It's worth noting that the JSON returned by the server contains other information (which you didn't ask for in your question, but you might find relevant/useful). It's too long to post here, but you can explore it for yourself.
  • I think it might be possible to get information for multiple tracking numbers in a single request. (I say this because in the request TrackPackagesRequest.trackingInfoList is an array -- and in the response, TrackPackagesResponse.packageList is also an array). It's just a supposition/rational guess at this stage, but might be something that can potentially reduce how long your code takes to finish.
  • It might be worth getting a VBA module (https://github.com/VBA-tools/VBA-JSON) which supports JSON parsing. I didn't bother, since you only wanted the delivery status. But deserialising the response would be the proper way to do it (especially in terms of accessing the correct property path).
  • You might also want to check if the terms, which govern your usage of their website, expressly forbid web scraping or any other similar activities.

Regarding nested keyStatus property's value being "In transit" for invalid tracking numbers, check property path TrackPackagesResponse.packageList[0].errorList[0], where there is an object. For invalid tracking numbers it seems to be {"code":"1041","message":"This tracking number cannot be found. Please check the number or contact the sender."... -- and for valid tracking numbers, both the code and message properties appear to be zero-length strings.

It might be good to now get the VBA JSON module that I mention above, since there are two errorList objects (at different levels of nesting) and you want to make sure you're accessing the correct one.

The change required in the code would probably be to first check if the code and message properties of TrackPackagesResponse.packageList[0].errorList[0] indicate the tracking number is invalid (and return message if invalid). Otherwise, return TrackPackagesResponse.packageList[0].keyStatus. I don't have time to implement these changes right now. But I think it's something you can do (unless you're really unsure, in which case let me know which bit you need help with).

Upvotes: 1

Related Questions