Reputation: 161
I am trying to pull some data from a HTML table, however I keep getting an ERROR Message 13 "Type Mismatch". It may be because i am using the wrong html tag. I have been struggling on this for a few days now, as I can not work out the correct process.
This is the HTML that I am using
The Code
Dim htmlTable As Object
Dim collTD As Collection
Dim oNode As Object
Dim IE As Object
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = True
.navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Set htmlTable = IE.document.getElementById("table_1")
Set collTD = htmlTable.getElementsByTagName("td")
For Each oNode In collTD
.Range("A" & RowCount) = oNode.innerHTML
RowCount = RowCount + 1
Next oNode
End With
UPDATE AS OF TODAY MON 12TH OCTOBER 2020
I have made a few changes to the code, as I wish to navigate the paginate table and extract the data off per page. I know the Pagination code that I added works as I am currently using it on another code, my problem is that it is not navigating the pages, possible due to wrong class or ID.
HTML for NEXT page
New Code, Updated
Dim htmlTable As Object
Dim collTD As Object
Dim oNode As Object
Dim IE As Object
Dim RowCount As Long
Dim currentColumn As Long
''''' NEW '''''''
Dim HTMLdoc As Object
Dim nextPageElements As Object
Dim pageNumber As Long
'''' START ROW 5 ''''''
RowCount = 5
currentColumn = 1
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = True
.navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .readyState <> 4: DoEvents: Loop
Application.Wait (Now + TimeSerial(0, 0, 2))
StartForLoop_Restart:
Set htmlTable = .document.getElementById("table_1")
Set collTD = htmlTable.getElementsByTagName("td")
For Each oNode In collTD
If currentColumn Mod 11 = 0 Then
RowCount = RowCount + 1
currentColumn = 1
End If
Cells(RowCount, currentColumn) = oNode.innerHTML 'Change to inner HTML to pull of url
currentColumn = currentColumn + 1
Next oNode
''''' ######### NEXT PAGE AND DELAYS ###### added by me
Do
DoEvents
''''' Searches Number of Pages entered in Sheet3 A2
If pageNumber >= Replace(Worksheets("Sheet3").Range("A2").Value, " ", "+") Then Exit Do
On Error Resume Next
''' ################### CHANGE THE CLASS OF THE NEXT PAGE ########################## ERROR IS HERE ####
Set nextPageElements = HTMLdoc.getElementsByClassName("paginate_button next")(0)
'Set nextPageElements = HTMLdoc.getElementsID("table_1_next ")(0)
If nextPageElements Is Nothing Then Exit Do
''' ################### CHANGE THE CLASS OF THE NEXT PAGE ########################## ERROR IS ABOVE ####
''''' Scrolls Down the Browser
IE.document.parentWindow.Scroll 0&, 99999
'''''Random delay from Max number entered in
Application.Wait Now + TimeSerial(0, 0, Application.RandBetween(1, Worksheets("Sheet3").Range("B2").Value))
nextPageElements.Click 'next web page
Do While IE.Busy Or IE.readyState <> 4
DoEvents
Loop
''''Second Random delay from Max number entered
Application.Wait Now + TimeSerial(0, 0, Application.RandBetween(1, Worksheets("Sheet3").Range("C2").Value))
Set HTMLdoc = IE.document
pageNumber = pageNumber + 1
GoTo StartForLoop_Restart
Loop
End With
IE.Quit
Set IE = Nothing
Set HTMLdoc = Nothing
Set nextPageElements = Nothing
MsgBox "All Done"
Also need
Currently I don't thing the next page results are copied and pasted under the previous page results. It should be something like this. Color is only for your reff.
Upvotes: 0
Views: 215
Reputation: 2267
With late binding you don't need a reference.
Sub Test()
Dim htmlTable As Object
Dim collTD As Object
Dim oNode As Object
Dim IE As Object
Dim RowCount As Long
Dim currentColumn As Long
RowCount = 1
currentColumn = 1
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = True
.navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .readyState <> 4: DoEvents: Loop
Application.Wait (Now + TimeSerial(0, 0, 2))
Set htmlTable = .document.getElementById("table_1")
Set collTD = htmlTable.getElementsByTagName("td")
For Each oNode In collTD
If currentColumn Mod 11 = 0 Then
RowCount = RowCount + 1
currentColumn = 1
End If
Cells(RowCount, currentColumn) = oNode.innertext
currentColumn = currentColumn + 1
Next oNode
End With
End Sub
Upvotes: 1
Reputation: 10226
You should properly declare your objects as IHTML elements in order to manipulate them and use their methods.
First add a reference to Microsoft HTML Object Library
Your code becomes
Sub test()
Dim htmlTable As IHTMLElement
Dim collTD As IHTMLElementCollection
Dim oNode As IHTMLElement
Dim IE As Object
Dim RowCount As Integer
Set IE = CreateObject("InternetExplorer.application")
With IE
.Visible = True
.navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Set htmlTable = IE.document.getElementById("table_1")
Set collTD = htmlTable.getElementsByTagName("td")
RowCount = 1
For Each oNode In collTD
ThisWorkbook.Worksheets(1).Range("A" & RowCount) = oNode.innerHTML
RowCount = RowCount + 1
Next oNode
End With
End Sub
Upvotes: 2