Reputation: 1337
I'm trying to declare an array of nodes (which isn't a problem) and then scrape the innerHTML
of two child nodes within each element of the array - taking SE as an example (an using the IE
object method), assume I was trying to scrape the Title and extract of questions on the home page, there is an array of node (class name: "question-summary").
There's then two children nodes (The tile - class name: "question-hyperlink" and the extract - class name: "excerpt") the code I'm using is a follows:
Sub Scraper()
Dim ie As Object
Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object
Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String
Set ie = CreateObject("internetexplorer.application")
sURL = "https://stackoverflow.com/questions/tagged/excel-formula"
QuestionShell = "question-summary"
QuestionTitle = "question-hyperlink"
Question = "excerpt"
With ie
.Visible = False
.Navigate sURL
End With
Set doc = ie.Document 'Stepping through so doc is getting assigned (READY_STATE = 4)
Set oQuestionShells = doc.getElementsByClassName(QuestionShell)
For Each oElement In oQuestionShells
Set oQuestionTitle = oElement.getElementByClassName(QuestionTitle) 'Assigning this object causes an "Object doesn't support this property or method"
Set oQuestion = oElement.getElementByClassName(Question) 'Assigning this object causes an "Object doesn't support this property or method"
Debug.Print oQuestionTitle.innerHTML
Debug.Print oQuestion.innerHTML
Next
End Sub
Upvotes: 2
Views: 2287
Reputation: 19299
getElementByClassName
is not a method.
You can only use getElementsByClassName
(note the plural in the method name) which returns a IHTMLElementCollection
.
Using an Object
in place of IHTMLElementCollection
is fine - but you still have to access a specific element in the collection by giving an index.
Let's assume that for each oElement
that there is only one instance of the class question-summary
and one instance of the class question-hyperlink
. Then you can just use getElementsByClassName
and use (0)
at the end to pull out the first element of the array that is returned.
So your code correction is:
Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0)
Set oQuestion = oElement.getElementsByClassName(Question)(0)
Full working code (with a few updates i.e. use Option Explicit
and wait for the page to load):
Option Explicit
Sub Scraper()
Dim ie As Object
Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object
Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String
Set ie = CreateObject("internetexplorer.application")
sURL = "https://stackoverflow.com/questions/tagged/excel-formula"
QuestionShell = "question-summary"
QuestionTitle = "question-hyperlink"
Question = "excerpt"
With ie
.Visible = True
.Navigate sURL
Do
DoEvents
Loop While .ReadyState < 4 Or .Busy
End With
Set doc = ie.Document
Set oQuestionShells = doc.getElementsByClassName(QuestionShell)
For Each oElement In oQuestionShells
'Debug.Print TypeName(oElement)
Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0)
Set oQuestion = oElement.getElementsByClassName(Question)(0)
Debug.Print oQuestionTitle.innerHTML
Debug.Print oQuestion.innerHTML
Next
ie.Quit
End Sub
Upvotes: 2