Reputation: 15
I'm trying to understand why my references arent working well to scrape this data.
Here is the site as an example:
http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear
And as a target:
<div id="data_i6" class="rf_crow"><div id="Y_1" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="741131269">741</div><div id="Y_2" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="836611464">837</div><div id="Y_3" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="939841654">940</div><div id="Y_4" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1010472512">1,010</div><div id="Y_5" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1100344312">1,100</div><div id="Y_6" class="pos column6Width_noChart116px" style="overflow:hidden;white-space: nowrap;" rawvalue="1115401551">1,115</div></div>
What I need to extract is the actual value in rawvalue="741131269" and the following is what I've gotten to work so far.
'Cells(1, 1) = Document.getElementsByClassName("rf_crow")'returns the rows of data into one cell
'Cells(1, 1) = Document.getElementById("Y_1").innerText 'returns the text for the year
'Cells(1, 1) = Document.getElementById("data_i1").innerText 'returns to first row of data
I know the above doesn't return what I want, because the comment tells you what it extracts into Excel. The sub-element doesn't seem to work as it does in other macros I've built. I thought something like this would work:
Cells(1, 1) = Document.getElementById("Y_1").getAttribute("rawvalue")
but that doesn't work, also, I tried:
Cells(1, 1) = Document.getElementById("data_i6").getElementById("Y_1").innertext
and that doesn't work either.
Upvotes: 0
Views: 143
Reputation: 20322
Does this work for you?
Sub web_table_option_two()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
objIE.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear"
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
objIE.Quit
End Sub
Upvotes: 0
Reputation: 22440
The solution is very easy. Just call it using it's attribute which is `rawvalue.
This is how you can go:
Using Hardcoded delay
and for loop
to check the availability of the desired value:
Sub GetValue()
Dim IE As New InternetExplorer, HTML As HTMLDocument, post As Object, elem As Object
With IE
.Visible = True
.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
''using hardcoded delay
Application.Wait Now + TimeValue("00:00:05")
For Each elem In HTML.getElementsByTagName("div")
If elem.innerText = "741" Then MsgBox elem.getAttribute("rawvalue"): Exit For
Next elem
End Sub
Using Explicit Wait
:
Sub GetValue()
Dim IE As New InternetExplorer, HTML As HTMLDocument, post As Object
With IE
.Visible = True
.Navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear"
While .Busy = True Or .ReadyState < 4: DoEvents: Wend
Set HTML = .Document
End With
Do: Set post = HTML.querySelector("#data_i6 #Y_1"): DoEvents: Loop While post Is Nothing
MsgBox post.getAttribute("rawvalue")
End Sub
Output at this moment:
741131269
Upvotes: 1
Reputation: 84465
The following should illuminate some of the problems you were having.
.querySelectorAll
The exact element you mention is the second index returned by .querySelectorAll
method of .document
using the CSS selector #Y_1
. The #
means Id.
From that webpage it returns the following (sample shown - not all):
From the above you can see the string you want is returned by the index of 2 in the result.
querySelectorAll with Id? Isn't Id a unique identifier for a single element?
This Id, unexpectedly, is not unique to a single element on the page. It occurs a whopping 27 times:
This means you can use the .querySelectorAll
method to return a nodeList of all matching items and take the item at index 2 to get your result.
Note:
If you want the long number next to rawvalue
, 741131269
, then parse the outerHTML
of the returned element.
Debug.Print Replace(Split(Split(a.item(2).outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString)
.querySelector
Alternatively, you can target the id which is specific data_i6
with
.document.querySelector("#data_i6")
This CSS selector (#data_i6) returns the entire row as it has each year within. If using .querySelector
you will only get the first item anyway which is year 1.
You can be more specific with the CSS selector and add the additional year Id to get just the year of interest:
#data_i6 #Y_1
Code: (querySelector method commented out next to querySelectorAll)
Option Explicit
Public Sub Get_Information()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "http://quote.morningstar.ca/Quicktakes/Financials/is.aspx?t=GNTX®ion=USA&culture=en-CA&ops=clear"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Dim a As Object, exitTime As Date
exitTime = Now + TimeSerial(0, 0, 2)
Do
DoEvents
On Error Resume Next
Set a = .document.querySelectorAll("#Y_1") ' .document.querySelector("#data_i6 #Y_1")
On Error GoTo 0
If Now > exitTime Then Exit Do
Loop While a Is Nothing
If a Is Nothing Then Exit Sub
Debug.Print Split(Split(a.item(2).innerText, "rawvalue=")(0), ">")(0) 'Split(Split(a.innerText, "rawvalue=")(0), ">")(0)
Debug.Print Replace(Split(Split(a.item(2).outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString) 'Replace(Split(Split(a.outerHTML, "rawvalue=")(1), ">")(0), Chr$(34), vbNullString)
.Quit
End With
End Sub
Upvotes: 0
Reputation: 306
Try and declare "objCollection" as an object, strValue as string, and in the code below, replace in the first line the name of the http-request you declared:
Document.body.innerHTML = YourHTTPRequest.responseText
Set objCollection = Document.getElementsByClassName("rf_crow")
For Each objElement In objCollection
If objElement.ID = "Y_1" Then
strValue = objElement.getAttribute("rawvalue")
Exit For
End If
Next
Cells(1, 1) = strValue
Upvotes: 0