Reputation: 501
There are many cases where I use the IE object for test purpose and then switch to MSXML for production. I understand that these should produce identical responses but for some reason, the following code produces two different responses.
Sub testHTTP()
Dim link As String
link = "https://www.govtrack.us/congress/members/ralph_abraham/412630"
'THIS IS THE XML SECTION
Dim xmlHTMLDoc As HTMLDocument
Dim xmlWeb As msxml2.XMLHTTP60
Set xmlHTMLDoc = New HTMLDocument
Set xmlWeb = New msxml2.XMLHTTP60
xmlWeb.Open "GET", link, False
xmlWeb.send
While xmlWeb.readyState <> 4
DoEvents
Wend
Debug.Print " "
Debug.Print link
Debug.Print xmlWeb.Status; "XMLHTTP status "; xmlWeb.statusText; " at "; Time
xmlHTMLDoc.body.innerHTML = xmlWeb.responseText
Debug.Print "MSXML response finds image tag at position: " & InStr(xmlWeb.responseText, "img")
Debug.Print "MSXML response getElementsByTagName(img).Length is: " & xmlHTMLDoc.getElementsByTagName("img").Length
'THIS IS THE IE SECTION
Dim ieHTMLDoc As HTMLDocument
Dim objIE As Object
Set ieHTMLDoc = New HTMLDocument
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Top = 0
.Left = 600
.Width = 800
.Height = 600
.Visible = False
End With
objIE.navigate (link)
While objIE.readyState <> 4
DoEvents
Wend
If objIE.readyState = 4 Then
Set ieHTMLDoc = objIE.document
Debug.Print "IE response getElementsByTagName(img).Length is: " & ieHTMLDoc.getElementsByTagName("img").Length
End If
End Sub
Here are the results from the immediate window:
https://www.govtrack.us/congress/members/ralph_abraham/412630
200 XMLHTTP status OK at 7:50:44 PM
MSXML response finds image tag at position: 8936
MSXML response getElementsByTagName(img).Length is: 0
IE response getElementsByTagName(img).Length is: 10
Here's another example, this time trying to find anchor links:
Sub testHTTP()
Dim link As String
link = "https://www.govtrack.us/congress/members/ralph_abraham/412630"
'THIS IS THE XML SECTION
Dim xmlHTMLDoc As HTMLDocument
Dim xmlWeb As msxml2.XMLHTTP60
Set xmlHTMLDoc = New HTMLDocument
Set xmlWeb = New msxml2.XMLHTTP60
xmlWeb.Open "GET", link, False
xmlWeb.send
While xmlWeb.readyState <> 4
DoEvents
Wend
Debug.Print " "
Debug.Print link
Debug.Print xmlWeb.Status; "XMLHTTP status "; xmlWeb.statusText; " at "; Time
xmlHTMLDoc.body.innerHTML = xmlWeb.responseText
Debug.Print "MSXML response finds anchor tag at position: " & InStr(xmlWeb.responseText, "<a ")
Debug.Print "MSXML response getElementsByTagName(<a ).Length is: " & xmlHTMLDoc.getElementsByTagName("a").Length
'THIS IS THE IE SECTION
Dim ieHTMLDoc As HTMLDocument
Dim objIE As Object
Set ieHTMLDoc = New HTMLDocument
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Top = 0
.Left = 600
.Width = 800
.Height = 600
.Visible = False
End With
objIE.navigate (link)
While objIE.readyState <> 4
DoEvents
Wend
If objIE.readyState = 4 Then
Set ieHTMLDoc = objIE.document
Debug.Print "IE response getElementsByTagName(<a ).Length is: " & ieHTMLDoc.getElementsByTagName("a").Length
End If
End Sub
Here is the immediate window:
https://www.govtrack.us/congress/members/ralph_abraham/412630
200 XMLHTTP status OK at 12:21:08 PM
MSXML response finds anchor tag at position: 3774
MSXML response getElementsByTagName(<a ).Length is: 0
IE response getElementsByTagName(<a ).Length is: 131
Here is the method that blows up the code:
getElementsByClassName("photo")(0).getElementsByTagName("img")(0).src
This produces an "object variable or with block variable not set" error when run against the XML request response but not the IE response. It looks like the XML response has everything in it but is not being interpreted properly as an HTMLDocument object. Possibly I could try to strip off some of the beginning of the text file and then reset it as an HTMLDocument.
I need to know how to substitute XML or some other http method in vba for IE.
Upvotes: 1
Views: 591
Reputation: 84465
The short answer might be that MSXML does away with images (and other info) which would otherwise be rendered when using a browser. You are dealing with different response text though both are HTML. The MSXML doesn't have to inform a browser of all the additional rendering information for the page view.
N.B. The xmlWeb.responseText returns a DOMString that contains the response to the request as text, if successful.
Not ideal but you can get the available src attribute strings with regex from the responseText. You can tweak it to only work with image extensions e.g. jpeg.
Option Explicit
Public Sub PrintSrcs()
Dim sResponse As String, html As New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.govtrack.us/congress/members/ralph_abraham/412630", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
Dim links(), i As Long
links = GetLinks(sResponse, "src=""[^""]*") '(?<=<img src=")[^"]* '<== no supported?
For i = LBound(links) To UBound(links)
Debug.Print links(i)
Next i
End Sub
Public Function GetLinks(ByVal inputString As String, ByVal sPattern As String) As Variant
Dim Matches As Object, iMatch As Object, s As String, arrMatches(), i As Long
With CreateObject("vbscript.regexp")
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = sPattern
If .test(inputString) Then
Set Matches = .Execute(inputString)
For Each iMatch In Matches
ReDim Preserve arrMatches(i)
arrMatches(i) = Replace$(iMatch.Value, "src=""", vbNullString)
i = i + 1
Next
Else
Debug.Print "Failed"
End If
End With
GetLinks = arrMatches
End Function
Upvotes: 1
Reputation: 21639
I think comparing XML to IE is like comparing apples to oranges a juice machine.
Internet Explorer is a Web Browser intended to simplify the process of requesting and receiving packages information from remote servers, rendering dynamically as necessary for our human devices.
XML is a machine-readable markup language used especially to display documents on the Internet. It's a metalanguage meaning that it can be used to describe, or itself other languages.
XML defines the logical structure of documents and the way a document is accessed and manipulated; a plain-text method of organizing information.
*XMLHttpRequest
** XMLHttpRequest
?Upvotes: 2