Reputation: 39
I am doing VBA web scraping for tracking the status of fedex shipment.
The URL is
https://www.fedex.com/apps/fedextrack/index.html?tracknumbers=475762806100&cntry_code=in
The current status of this shipment is delivered. I want to extract this.
I know how to select elements that have a class name.
The above website's HTML code does not have class names.
How do I select an element that does not have a specific class name using queryselector?
I am using the fastest method which is MSXML2.XMLHTTP in my VBA code. One drawback of this method is getelementbyclassname will not work in this method or any other way to make the program to sense the getelementbyclassname.
This is the reason I chose to use queryselector.
I am not able to pick the correct element using query selector.
From the HTML codes, How do I get the inner text of the class named ----- "redesignStatusChevronTVC tank-results-item__data-label-large tank-text-center statusChevron_key_status"?
<h3 class="redesignStatusChevronTVC tank-results-item__data-label-large tank-text-center statusChevron_key_status">Delivered</h3>
Sub GetInfo()
Dim sResponse As String, i As Long, html As New HTMLDocument
Dim prices As Object, info As Object
Application.ScreenUpdating = False
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.fedex.com/apps/fedextrack/index.html?tracknumbers=475762806100&cntry_code=in", False
.send
sResponse = .responseText
End With
With html
.body.innerHTML = sResponse
Set info = .querySelectorAll("redesignStatusChevronTVC tank-results-item__data-label-large tank-text-center statusChevron_key_status")
End With
With Worksheets(3)
Worksheets(3).Activate '
For i = 0 To info.Length - 1
Debug.Print info(i).innerText
Next i
End With
Application.ScreenUpdating = True
End Sub
Upvotes: 0
Views: 160
Reputation: 4486
Assuming I'm reading things correctly on my end, I think the HTML returned by the server does not contain the information you're looking for.
To confirm this, try printing:
InStr(1, sResponse, "redesignStatusChevronTVC tank-results-item__data-label-large tank-text-center statusChevron_key_status", vbTextCompare)
to the immediate window and you should see it return 0
(meaning that text is not present within the response text).
The information you're trying to scrape (and even the element which needs selecting with .redesignStatusChevronTVC.tank-results-item__data-label-large.tank-text-center.statusChevron_key_status
) is populated dynamically via JavaScript and does not exist at the time you're trying to access it.
From what I can see, the web page makes an HTTP POST request, and the server returns some JSON, which represents information relating to that tracking number. Try the code below (run the procedure JustATest
), which tries to make the same HTTP POST request:
Option Explicit
Private Sub JustATest()
MsgBox "Delivery status is: " & GetDeliveryStatusForPackage("475762806100", "en_IN")
End Sub
Private Function GetDeliveryStatusForPackage(ByVal trackingNumber As String, ByVal localeValue As String)
' Given a "trackingNumber" and "localeValue", should return the delivery status of that package.
Dim jsonResponse As String
jsonResponse = GetFedExJson(trackingNumber, localeValue)
GetDeliveryStatusForPackage = ExtractDeliveryStatusFromJson(jsonResponse)
End Function
Private Function ExtractDeliveryStatusFromJson(ByVal someJson As String) As String
' Should extract the delivery status. This function treats the JSON
' encoded string as a string and hence relies on basic string matching.
Const START_DELIMITER As String = """keyStatus"":"""
Dim startDelimiterIndex As Long
startDelimiterIndex = InStr(1, someJson, START_DELIMITER)
Debug.Assert startDelimiterIndex > 0
startDelimiterIndex = startDelimiterIndex + Len(START_DELIMITER)
Dim endDelimiterIndex As Long
endDelimiterIndex = InStr(startDelimiterIndex + 1, someJson, """", vbBinaryCompare)
Debug.Assert endDelimiterIndex > 0
ExtractDeliveryStatusFromJson = Mid$(someJson, startDelimiterIndex, endDelimiterIndex - startDelimiterIndex)
End Function
Private Function GetFedExJson(ByVal trackingNumber As String, ByVal localeValue As String) As String
' Should return a JSON-encoded response. The return value can be
' passed to a function that parses JSON (if such a function is available for use).
Dim formToPost As String
formToPost = CreateFedExForm(trackingNumber, localeValue)
Const TARGET_URL As String = "https://www.fedex.com/trackingCal/track"
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", TARGET_URL, False
.SetRequestHeader "Connection", "keep-alive"
.SetRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"
.Send formToPost
Debug.Assert InStr(1, .ResponseText, "{""TrackPackagesResponse"":{""successful"":true,", vbBinaryCompare)
GetFedExJson = .ResponseText
End With
End Function
Private Function CreateFedExForm(ByVal trackingNumber As String, ByVal localeValue As String) As String
' Should return a string representing a form of URL encoded name-value pairs.
Dim data As String
data = "{""TrackPackagesRequest"":{""appType"":""WTRK"",""appDeviceType"":""DESKTOP"",""supportHTML"":true,""supportCurrentLocation"":true,""uniqueKey"":"""",""processingParameters"":{},""trackingInfoList"":[{""trackNumberInfo"":{""trackingNumber"":""" & trackingNumber & """,""trackingQualifier"":"""",""trackingCarrier"":""""}}]}}"
CreateFedExForm = "data=" & Application.EncodeURL(data) & "&action=trackpackages&locale=" & Application.EncodeURL(localeValue) & "&version=1&format=json"
End Function
GetDeliveryStatusForPackage
seems capable of returning the delivery status of a given trackingNumber
and localeValue
.TrackPackagesRequest.trackingInfoList
is an array -- and in the response, TrackPackagesResponse.packageList
is also an array). It's just a supposition/rational guess at this stage, but might be something that can potentially reduce how long your code takes to finish.Regarding nested keyStatus
property's value being "In transit"
for invalid tracking numbers, check property path TrackPackagesResponse.packageList[0].errorList[0]
, where there is an object. For invalid tracking numbers it seems to be {"code":"1041","message":"This tracking number cannot be found. Please check the number or contact the sender."...
-- and for valid tracking numbers, both the code
and message
properties appear to be zero-length strings.
It might be good to now get the VBA JSON module that I mention above, since there are two errorList
objects (at different levels of nesting) and you want to make sure you're accessing the correct one.
The change required in the code would probably be to first check if the code
and message
properties of TrackPackagesResponse.packageList[0].errorList[0]
indicate the tracking number is invalid (and return message
if invalid). Otherwise, return TrackPackagesResponse.packageList[0].keyStatus
. I don't have time to implement these changes right now. But I think it's something you can do (unless you're really unsure, in which case let me know which bit you need help with).
Upvotes: 1