Reputation: 347
I'm not a developer and have very limited XML knowledge but for what I've learned the past 3-4 days researching on the web. So apologies in advance for the basic level of this question. I'm trying to wrap up this one time task.
I have some VBA Excel knowledge and currently I'm trying to use VBA to extract the SIC code attribute from a given company's page on the SEC filing website. As an example, this is the site for Walmart
In the blue bar at the top you can see 'SIC: 5331' it's the 5331 I'm trying to return to a VBA variable so I can populate a spreadsheet. When I right click in IE and clich View Source the part of the page that is relevant reads in XML as:
<div id="contentDiv">
<!-- START FILER DIV -->
<div style="margin: 15px 0 10px 0; padding: 3px; overflow: hidden; background-color: #BCD6F8;">
<div class="mailer">Mailing Address
<span class="mailerAddress">702 SOUTHWEST 8TH STREET</span>
<span class="mailerAddress"> BENTONVILLE AR 72716 </span>
</div>
<div class="mailer">Business Address
<span class="mailerAddress">702 SOUTHWEST 8TH ST</span>
<span class="mailerAddress">BENTONVILLE AR 72716 </span>
<span class="mailerAddress">5012734000</span>
</div>
<div class="companyInfo">
<span class="companyName">WAL MART STORES INC <acronym title="Central Index Key">CIK</acronym>#: <a href="/cgi-bin/browse-edgar?action=getcompany&CIK=0000104169&owner=exclude&count=40">0000104169 (see all company filings)</a></span>
<p class="identInfo"><acronym title="Standard Industrial Code">SIC</acronym>: <a href="/cgi-bin/browse-edgar?action=getcompany&SIC=5331&owner=exclude&count=40">5331</a> - RETAIL-VARIETY STORES<br />State location: <a href="/cgi-bin/browse-edgar?action=getcompany&State=AR&owner=exclude&count=40">AR</a> | State of Inc.: <strong>DE</strong> | Fiscal Year End: 0131<br />(Assistant Director Office: 2)<br />Get <a href="/cgi-bin/own-disp?action=getissuer&CIK=0000104169"><b>insider transactions</b></a> for this <b> issuer</b>.
<br />Get <a href="/cgi-bin/own-disp?action=getowner&CIK=0000104169"><b>insider transactions</b></a> for this <b>reporting owner</b>.
</p>
</div>
</div>
</div>
In trying to understand how VBA might be used to extract the SIC, I found the following post on your site:
Query and parse xml attribute value into XLS using VBA
I tried to apply barrowc's answer by copy/paste into an Excel module and inserted the path to the Wal Mart filings however when I step through I get the Debug.Print "*****" but I'm not getting anything for n.Text.
Sub test4()
Dim d As MSXML2.DOMDocument60
Dim i As IXMLDOMNodeList
Dim n As IXMLDOMNode
Set d = New MSXML2.DOMDocument60
d.async = False
d.Load ("http://www.sec.gov/cgi-bin/browse-edgar?company=&match=&CIK=886475&filenum=&State=&Country=&SIC=&owner=exclude&Find=Find+Companies&action=getcompany")
Debug.Print "*****"
Set i = d.SelectNodes("//div[@id='contentDiv']")
For Each n In i
Debug.Print n.Text
Next n
Debug.Print "*****"
Set d = Nothing
End Sub
I've tried various strings in d.SelectNodes()
, but I don't know enough about this topic to understand where I'm going wrong. So either a comment on my syntax or a pointer to a resource would be enormously helpful.
Upvotes: 1
Views: 3407
Reputation: 347
Thanks again mwolfe. I've posted my code below but what you provided is much more elegant. I know that the SICs are only 4 digits so I was lazy and made that an assumption in the code and that could throw up errors in the future. You can see how I did it in the commented out part.
Sub GetSICs()
Application.ScreenUpdating = False
Dim AWBN As String
Dim ASN As String
Dim CIK As String
Dim NUM_FILES_TO_GET As Long
Dim COUNTER As Long
Dim SICTagPos As Integer
Dim SIC As String
Set IEbrowser = CreateObject("InternetExplorer.application")
IEbrowser.Visible = False
AWBN = ActiveWorkbook.Name
ASN = ActiveSheet.Name
Workbooks(AWBN).Sheets(ASN).Range("A1").Select
ActiveCell.Offset(0, 11) = "SIC"
NUM_FILES_TO_GET = Application.WorksheetFunction.CountA(Range("A:A"))
For COUNTER = 1 To 3 'NUM_FILES_TO_GET
Application.StatusBar = "Counter = " & COUNTER
'SICTagPos = 0
CIK = ActiveCell.Offset(COUNTER, 2)
IEbrowser.Navigate URL:="http://www.sec.gov/edgar/searchedgar/companysearch.html"
Do
DoEvents
Loop Until IEbrowser.readyState = 4
Set frm = IEbrowser.Document.forms(0)
frm("CIK").Value = CIK
frm.submit
While IEbrowser.Busy Or IEbrowser.readyState <> 4: DoEvents: Wend
SIC = ExtractSIC(IEbrowser.Document.body.innerhtml)
'SICTagPos = InStr(1, IEbrowser.Document.body.innerhtml, "SIC=")
'SIC = Right(Left(IEbrowser.Document.body.innerhtml, SICTagPos + 7), 4)
ActiveCell.Offset(COUNTER, 11).NumberFormat = "@"
ActiveCell.Offset(COUNTER, 11) = SIC
Next
Application.StatusBar = False
Application.ScreenUpdating = True
End Sub
Function ExtractSIC(SourceHtml As String) As String
Const PrefixChars As String = "&SIC="
Const SuffixChars As String = "&"
Dim StartPos As Long, EndPos As Long
StartPos = InStr(SourceHtml, PrefixChars)
If StartPos = 0 Then Exit Function
StartPos = StartPos + Len(PrefixChars)
EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function
Upvotes: 0
Reputation: 24207
If you are interested in just the SIC, it is not worth your time to try to parse the entire DOM structure. Instead, identify a unique set of characters, search for that and then extract the SIC from there.
The following function does just that. You just need to pass it the full HTML source of the page and it will return the SIC:
Function ExtractSIC(SourceHtml As String) As String
Const PrefixChars As String = "&SIC="
Const SuffixChars As String = "&"
Dim StartPos As Long, EndPos As Long
StartPos = InStr(SourceHtml, PrefixChars)
If StartPos = 0 Then Exit Function
StartPos = StartPos + Len(PrefixChars)
EndPos = InStr(StartPos, SourceHtml, SuffixChars) - 1
ExtractSIC = Mid(SourceHtml, StartPos, EndPos - StartPos + 1)
End Function
Upvotes: 1