Reputation: 271
In the following code we are using IE Automation to get from here
"https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & Ticker & "&type=10-Q&dateb=&owner=exclude&count=20"
to a location like this
https://www.sec.gov/Archives/edgar/data/10795/000119312514042815/bdx-20131231.xml
Is there a way to go from location 1 to location 2 without using IE Automation and finding something more reliable, secure and faster?
For reasons of completeness here is the full code we have by now; by running you will see heavy use of IE:
Option Explicit
Sub MadMule2()
Dim IE As InternetExplorer
Dim el
Dim els
Dim colDocLinks As New Collection
Dim Ticker As String
Dim lnk
Dim intCounter as Integer
Set IE = New InternetExplorer
IE.Visible = False
Ticker = Worksheets("Sheet1").Range("A1").Value
LoadPage IE, "https://www.sec.gov/cgi-bin/browse-edgar?" & _
"action=getcompany&CIK=" & Ticker & "&type=10-Q" & _
"&dateb=&owner=exclude&count=20"
Set els = IE.document.getElementsByTagName("a")
For Each el In els
If Trim(el.innerText) = "Documents" Then
colDocLinks.Add el.href
End If
Next el
intCounter = 1
For Each lnk In colDocLinks
LoadPage IE, CStr(lnk)
For Each el In IE.document.getElementsByTagName("a")
If el.href Like "*[0-9].xml" Then
ActiveWorkbook.XmlMaps.Add(el, "xbrl").Name = "xbrl Map"
End If
Next el
Next lnk
End Sub
Sub LoadPage(IE As InternetExplorer, URL As String)
IE.navigate URL
Do While IE.Busy Or IE.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
End Sub
ADDITIONAL
Q: Is there a way to go from location 1 to location 2 without using IE Automation and finding something more reliable, secure and faster?
Can you expand on this?
By mehow
A: Here is a comment for the code block we have received by user2140261 here:
You should look into MSXML it is much faster, secure, and reliable then IE automation.
Since the code opens Internet Explorer, parses the source page to find the href and get to the Web location needed; we wondered if there is a way to go into location 2 without the use of IE. Can it be done with MSXML as user2140261 states?
Upvotes: 1
Views: 1148
Reputation: 84465
Here is an example using XHR
Option Explicit
Public Sub GetLinks()
Dim ticker As String, html As New HTMLDocument, links As Object, i As Long
ticker = [A1] 'example is 81251
Set html = GetHTMLDocument("https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" & ticker)
If html Is Nothing Then Exit Sub
Set links = html.querySelectorAll("#documentsbutton")
If links Is Nothing Then Exit Sub
For i = 0 To links.Length - 1
Debug.Print GetAbsoluteURL(links(i).getAttribute("href"))
Next i
End Sub
Public Function GetAbsoluteURL(ByVal relativeURL As String) As String
If InStr(relativeURL, "about:/") > 0 Then
GetAbsoluteURL = Replace$(relativeURL, "about:/", "https://www.sec.gov/")
Else
GetAbsoluteURL = relativeURL
End If
End Function
Public Function GetHTMLDocument(ByVal URL As String) As HTMLDocument
Dim sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", URL, False
.send
If .Status <> 200 Then
Exit Function
Else
sResponse = StrConv(.responseBody, vbUnicode)
End If
End With
Set GetHTMLDocument = New HTMLDocument
GetHTMLDocument.body.innerHTML = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
End Function
Sample ticker is 81251
Sample output:
CSS selectors and .querySelectorAll
You can target the documents buttons with a CSS selector of "#documentsbutton"
.
Now, whilst #
means id, and id is usually unique, for this particular website it seems that the same id is used to identify all elements within a table.
The CSS selector query returns the following (sample) from the page:
Applying the CSS selector in VBA:
As more than one element is matched, the .querySelectorAll
method of document
is used to return a nodeList
of items matched by the css selector:
html.querySelectorAll("#documentsbutton")
We can traverse the nodeList
, along its .Length
, and access individual document button
elements by index (position along nodeList; starting at 0):
For i = 0 To links.Length - 1
As you want the hyperlinked info you can simply pull the href
attribute from each matched element:
links(i).getAttribute("href")
This returns a relative path so we use a tiny helper function, GetAbsoluteURL
, to get the absolute path.
Closing notes:
Whilst not completely adhering to the Single Responsibility Principle this does show some of the benefits of re-factoring your code. You can then re-use GetHTMLDocument
function to handle the new URLs returned by GetAbsoluteURL
Upvotes: 1