Reputation: 89
I'm trying to scrape the inside ownership from Morningstar at this url: http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US
This is the code I'm using:
Sub test()
Dim appIE As Object
Set appIE = CreateObject("InternetExplorer.Application")
With appIE
.Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US"
.Visible = True
End With
While appIE.Busy
DoEvents
Wend
Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
Debug.Print allRowOfData
Dim myValue As String: myValue = allRowOfData.Cells(0).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub
I get run-time error 13 at line
Set allRowOfData = appIE.Document.getElementById("currentInsiderVal")
but I can't see any mismatch. What is going on?
Upvotes: 1
Views: 1013
Reputation: 12612
You can just do it with XHR and RegEx instead of cumbersome IE:
Sub Test()
Dim sContent
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US", False
.Send
sContent = .ResponseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = ",""currInsiderVal"":(.*?),"
Range("A30").Value = .Execute(sContent).Item(0).SubMatches(0)
End With
End Sub
Here is the description how the code works:
First of all MSXML2.XMLHTTP
ActiveX instance is created. GET request opened with target URL in synchronous mode (execution interrupts until response received).
Then VBScript.RegExp
is created. By default .IgnoreCase
, .Global
and .MultiLine
properties are False
. The pattern is ,"currInsiderVal":(.*?),
, where (.*?)
is a capturing group, .
means any character, .*
- zero or more characters, .*?
- as few as possible characters (lazy matching). Other characters in pattern to be found as is. .Execute
method returns a collection of matches, there is only one match object in it since .Global
is False
. This match object has a collection of submatches, there is only one submatch in it since the pattern contains the only capturing group.
There are some helpful MSDN articles on regex:
Microsoft Beefs Up VBScript with Regular Expressions
Introduction to Regular Expressions
Here is the description how I created the code:
First I found an element containing the target value on the webpage DOM using browser:
The corresponding node is:
<td align="right" id="currrentInsiderVal">143.51</td>
Then I made XHR and found this node in the response HTML, but it didn't contain the value (you can find response in the browser developer tools on network tab after you refresh the page):
<td align="right" id="currrentInsiderVal">
</td>
Such behavior is typical for DHTML. Dynamic HTML content is generated by scripts after the webpage loaded, either after retrieving a data from web via XHR or just processing already loaded withing webpage data. Then I just searched for the value 143.51
in the response, the snippet ,"currInsiderVal":143.51,
located within JS function:
fundsArr = {"fundTotalHistVal":132.61,"mutualFunds":[[1,89,"#a71620"],[2,145,"#a71620"],[3,152,"#a71620"],[4,198,"#a71620"],[5,155,"#a71620"],[6,146,"#a71620"],[7,146,"#a71620"],[8,132,"#a71620"]],"insiderHisMaxVal":3.535,"institutions":[[1,273,"#283862"],[2,318,"#283862"],[3,351,"#283862"],[4,369,"#283862"],[5,311,"#283862"],[6,298,"#283862"],[7,274,"#283862"],[8,263,"#283862"]],"currFundData":[2,2202,"#a6001d"],"currInstData":[1,4370,"#283864"],"instHistMaxVal":369,"insiders":[[5,0.042,"#ff6c21"],[6,0.057,"#ff6c21"],[7,0.057,"#ff6c21"],[8,3.535,"#ff6c21"],[5,0],[6,0],[7,0],[8,0]],"currMax":4370,"histLineQuars":[[1,"Q2"],[2,"Q3"],[3,"Q4"],[4,"Q1<br>2015"],[5,"Q2"],[6,"Q3"],[7,"Q4"],[8,"Q1<br>2016"]],"fundHisMaxVal":198,"currInsiderData":[3,143,"#ff6900"],"currFundVal":2202.85,"quarters":[[1,"Q2"],[2,""],[3,""],[4,"Q1<br>2015"],[5,""],[6,""],[7,""],[8,"Q1<br>2016"]],"insiderTotalHistVal":3.54,"currInstVal":4370.46,"currInsiderVal":143.51,"use10YearData":"false","instTotalHistVal":263.74,"maxValue":369};
So the regex pattern created based on that it should find the snippet ,"currInsiderVal":<some text>,
where <some text>
is our target value.
Upvotes: 1
Reputation: 4356
Had a look on the site and the element you are trying to retrieve has a typo in it; instead of currentInsiderVal
try using currrentInsiderVal
and you should retrieve the data correctly.
Probably worth considering some error trapping to catch stuff like this for any other fields you retrieve?
After your comment I took a closer look. Your issue seemed like it was trying to trap the id of the individual cell rather than navigating down the object tree. I've modified the code to retrieve the row of the table you are after and then set myValue to be the correct cell within that row. Seemed to be working when I tried it out. Give this a shot?
Sub test()
Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")
With appIE
.Navigate "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US"
.Visible = True
End With
While appIE.Busy
DoEvents
Wend
Set allRowOfData = appIE.Document.getelementbyID("tableTest").getElementsByTagName("tbody")(0).getElementsByTagName("tr")(5)
myValue = allRowOfData.Cells(2).innerHTML
appIE.Quit
Set appIE = Nothing
Range("A30").Value = myValue
End Sub
Upvotes: 0