jeromekjerome
jeromekjerome

Reputation: 501

Why does VBA MSXML produce a different response from same URL than IE Object produces?

There are many cases where I use the IE object for test purpose and then switch to MSXML for production. I understand that these should produce identical responses but for some reason, the following code produces two different responses.

Sub testHTTP()
    Dim link As String
    link = "https://www.govtrack.us/congress/members/ralph_abraham/412630"

'THIS IS THE XML SECTION
    Dim xmlHTMLDoc As HTMLDocument
    Dim xmlWeb As msxml2.XMLHTTP60
    Set xmlHTMLDoc = New HTMLDocument
    Set xmlWeb = New msxml2.XMLHTTP60

    xmlWeb.Open "GET", link, False
    xmlWeb.send

    While xmlWeb.readyState <> 4
        DoEvents
    Wend

    Debug.Print " "
    Debug.Print link
    Debug.Print xmlWeb.Status; "XMLHTTP status "; xmlWeb.statusText; " at "; Time

    xmlHTMLDoc.body.innerHTML = xmlWeb.responseText
    Debug.Print "MSXML response finds image tag at position: " & InStr(xmlWeb.responseText, "img")
    Debug.Print "MSXML response getElementsByTagName(img).Length is: " & xmlHTMLDoc.getElementsByTagName("img").Length

'THIS IS THE IE SECTION
    Dim ieHTMLDoc As HTMLDocument
    Dim objIE As Object
    Set ieHTMLDoc = New HTMLDocument
    Set objIE = CreateObject("InternetExplorer.Application")

    With objIE
        .Top = 0
        .Left = 600
        .Width = 800
        .Height = 600
        .Visible = False
    End With

    objIE.navigate (link)
    While objIE.readyState <> 4
        DoEvents
    Wend

    If objIE.readyState = 4 Then
        Set ieHTMLDoc = objIE.document
        Debug.Print "IE response getElementsByTagName(img).Length is: " & ieHTMLDoc.getElementsByTagName("img").Length
    End If

End Sub

Here are the results from the immediate window:

https://www.govtrack.us/congress/members/ralph_abraham/412630
200 XMLHTTP status OK at 7:50:44 PM 
MSXML response finds image tag at position: 8936
MSXML response getElementsByTagName(img).Length is: 0
IE response getElementsByTagName(img).Length is: 10

Here's another example, this time trying to find anchor links:

Sub testHTTP()
    Dim link As String
    link = "https://www.govtrack.us/congress/members/ralph_abraham/412630"

'THIS IS THE XML SECTION
    Dim xmlHTMLDoc As HTMLDocument
    Dim xmlWeb As msxml2.XMLHTTP60
    Set xmlHTMLDoc = New HTMLDocument
    Set xmlWeb = New msxml2.XMLHTTP60

    xmlWeb.Open "GET", link, False
    xmlWeb.send

    While xmlWeb.readyState <> 4
        DoEvents
    Wend

    Debug.Print " "
    Debug.Print link
    Debug.Print xmlWeb.Status; "XMLHTTP status "; xmlWeb.statusText; " at "; Time

    xmlHTMLDoc.body.innerHTML = xmlWeb.responseText
    Debug.Print "MSXML response finds anchor tag at position: " & InStr(xmlWeb.responseText, "<a ")
    Debug.Print "MSXML response getElementsByTagName(<a ).Length is: " & xmlHTMLDoc.getElementsByTagName("a").Length

'THIS IS THE IE SECTION
    Dim ieHTMLDoc As HTMLDocument
    Dim objIE As Object
    Set ieHTMLDoc = New HTMLDocument
    Set objIE = CreateObject("InternetExplorer.Application")

    With objIE
        .Top = 0
        .Left = 600
        .Width = 800
        .Height = 600
        .Visible = False
    End With

    objIE.navigate (link)
    While objIE.readyState <> 4
        DoEvents
    Wend

    If objIE.readyState = 4 Then
        Set ieHTMLDoc = objIE.document
        Debug.Print "IE response getElementsByTagName(<a ).Length is: " & ieHTMLDoc.getElementsByTagName("a").Length
    End If
End Sub

Here is the immediate window:

https://www.govtrack.us/congress/members/ralph_abraham/412630
200 XMLHTTP status OK at 12:21:08 PM 
MSXML response finds anchor tag at position: 3774
MSXML response getElementsByTagName(<a ).Length is: 0
IE response getElementsByTagName(<a ).Length is: 131

Here is the method that blows up the code:

getElementsByClassName("photo")(0).getElementsByTagName("img")(0).src    

This produces an "object variable or with block variable not set" error when run against the XML request response but not the IE response. It looks like the XML response has everything in it but is not being interpreted properly as an HTMLDocument object. Possibly I could try to strip off some of the beginning of the text file and then reset it as an HTMLDocument.

I need to know how to substitute XML or some other http method in vba for IE.

Upvotes: 1

Views: 591

Answers (2)

QHarr
QHarr

Reputation: 84465

The short answer might be that MSXML does away with images (and other info) which would otherwise be rendered when using a browser. You are dealing with different response text though both are HTML. The MSXML doesn't have to inform a browser of all the additional rendering information for the page view.

N.B. The xmlWeb.responseText returns a DOMString that contains the response to the request as text, if successful.


Not ideal but you can get the available src attribute strings with regex from the responseText. You can tweak it to only work with image extensions e.g. jpeg.

Option Explicit
Public Sub PrintSrcs()
    Dim sResponse As String, html As New HTMLDocument
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.govtrack.us/congress/members/ralph_abraham/412630", False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
    Dim links(), i  As Long
    links = GetLinks(sResponse, "src=""[^""]*")  '(?<=<img src=")[^"]*  '<== no supported?
    For i = LBound(links) To UBound(links)
        Debug.Print links(i)
    Next i
End Sub

Public Function GetLinks(ByVal inputString As String, ByVal sPattern As String) As Variant
    Dim Matches As Object, iMatch As Object, s As String, arrMatches(), i As Long
    With CreateObject("vbscript.regexp")
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = sPattern
        If .test(inputString) Then
            Set Matches = .Execute(inputString)
            For Each iMatch In Matches
                ReDim Preserve arrMatches(i)
                arrMatches(i) = Replace$(iMatch.Value, "src=""", vbNullString)
                i = i + 1
            Next
        Else
            Debug.Print "Failed"
        End If
    End With
    GetLinks = arrMatches
End Function

Upvotes: 1

ashleedawg
ashleedawg

Reputation: 21639

I think comparing XML to IE is like comparing apples to oranges a juice machine.


Internet Explorer is a Web Browser intended to simplify the process of requesting and receiving packages information from remote servers, rendering dynamically as necessary for our human devices.


XML is a machine-readable markup language used especially to display documents on the Internet. It's a metalanguage meaning that it can be used to describe, or itself other languages.

XML defines the logical structure of documents and the way a document is accessed and manipulated; a plain-text method of organizing information.


Related:

Upvotes: 2

Related Questions