Sharid
Sharid

Reputation: 161

HTML table data can not be extracted

I am trying to pull some data from a HTML table, however I keep getting an ERROR Message 13 "Type Mismatch". It may be because i am using the wrong html tag. I have been struggling on this for a few days now, as I can not work out the correct process.

This is the HTML that I am using

HTML Code

The Code

Dim htmlTable As Object
Dim collTD As Collection
Dim oNode As Object
Dim IE As Object

Set IE = CreateObject("InternetExplorer.application")
With IE
    .Visible = True
    .navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"

' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

Set htmlTable = IE.document.getElementById("table_1")
Set collTD = htmlTable.getElementsByTagName("td")

For Each oNode In collTD
    .Range("A" & RowCount) = oNode.innerHTML
     RowCount = RowCount + 1
    
Next oNode

End With

Error On Code

UPDATE AS OF TODAY MON 12TH OCTOBER 2020

I have made a few changes to the code, as I wish to navigate the paginate table and extract the data off per page. I know the Pagination code that I added works as I am currently using it on another code, my problem is that it is not navigating the pages, possible due to wrong class or ID.

WorkSheet Worksheet Image

HTML for NEXT page

Next Page HTML

New Code, Updated

Dim htmlTable As Object
Dim collTD As Object
Dim oNode As Object
Dim IE As Object
Dim RowCount As Long
Dim currentColumn As Long

''''' NEW '''''''
Dim HTMLdoc As Object
Dim nextPageElements As Object
Dim pageNumber As Long

'''' START ROW 5 ''''''
RowCount = 5
  currentColumn = 1
  
  Set IE = CreateObject("InternetExplorer.application")
  With IE
    .Visible = True
    .navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"
    
    ' Wait for the page to fully load; you can't do anything if the page is not fully loaded
    Do While .readyState <> 4: DoEvents: Loop
    Application.Wait (Now + TimeSerial(0, 0, 2))
    
    
StartForLoop_Restart:
    Set htmlTable = .document.getElementById("table_1")
    Set collTD = htmlTable.getElementsByTagName("td")
    
    For Each oNode In collTD
      If currentColumn Mod 11 = 0 Then
        RowCount = RowCount + 1
        currentColumn = 1
      End If
      
      Cells(RowCount, currentColumn) = oNode.innerHTML 'Change to inner HTML to pull of url
      currentColumn = currentColumn + 1
    Next oNode
   
   
''''' ######### NEXT PAGE AND DELAYS ######  added by me
 Do
  DoEvents

''''' Searches Number of Pages entered in Sheet3 A2
    If pageNumber >= Replace(Worksheets("Sheet3").Range("A2").Value, " ", "+") Then Exit Do
      On Error Resume Next
                      
''' ################### CHANGE THE CLASS OF THE NEXT PAGE ########################## ERROR IS HERE ####

    Set nextPageElements = HTMLdoc.getElementsByClassName("paginate_button next")(0)
    'Set nextPageElements = HTMLdoc.getElementsID("table_1_next ")(0)
    If nextPageElements Is Nothing Then Exit Do
    
''' ################### CHANGE THE CLASS OF THE NEXT PAGE ########################## ERROR IS ABOVE ####
               
''''' Scrolls Down the Browser
    IE.document.parentWindow.Scroll 0&, 99999
'''''Random delay from Max number entered in
    Application.Wait Now + TimeSerial(0, 0, Application.RandBetween(1, Worksheets("Sheet3").Range("B2").Value))
        nextPageElements.Click 'next web page
        Do While IE.Busy Or IE.readyState <> 4
        DoEvents
    Loop
''''Second Random delay from Max number entered
    Application.Wait Now + TimeSerial(0, 0, Application.RandBetween(1, Worksheets("Sheet3").Range("C2").Value))
        Set HTMLdoc = IE.document
        pageNumber = pageNumber + 1
GoTo StartForLoop_Restart
    Loop
    End With

IE.Quit
    Set IE = Nothing
    Set HTMLdoc = Nothing
    Set nextPageElements = Nothing

MsgBox "All Done"

Also need

Currently I don't thing the next page results are copied and pasted under the previous page results. It should be something like this. Color is only for your reff.

Result Layout

Upvotes: 0

Views: 215

Answers (2)

Zwenn
Zwenn

Reputation: 2267

With late binding you don't need a reference.

Sub Test()

Dim htmlTable As Object
Dim collTD As Object
Dim oNode As Object
Dim IE As Object
Dim RowCount As Long
Dim currentColumn As Long

  RowCount = 1
  currentColumn = 1
  
  Set IE = CreateObject("InternetExplorer.application")
  With IE
    .Visible = True
    .navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"
    
    ' Wait for the page to fully load; you can't do anything if the page is not fully loaded
    Do While .readyState <> 4: DoEvents: Loop
    Application.Wait (Now + TimeSerial(0, 0, 2))
    
    Set htmlTable = .document.getElementById("table_1")
    Set collTD = htmlTable.getElementsByTagName("td")
    
    For Each oNode In collTD
      If currentColumn Mod 11 = 0 Then
        RowCount = RowCount + 1
        currentColumn = 1
      End If
      
      Cells(RowCount, currentColumn) = oNode.innertext
      currentColumn = currentColumn + 1
    Next oNode
  End With
End Sub

Upvotes: 1

Thomas G
Thomas G

Reputation: 10226

You should properly declare your objects as IHTML elements in order to manipulate them and use their methods.

First add a reference to Microsoft HTML Object Library

Your code becomes

Sub test()

Dim htmlTable As IHTMLElement
Dim collTD As IHTMLElementCollection
Dim oNode As IHTMLElement
Dim IE As Object
Dim RowCount As Integer

Set IE = CreateObject("InternetExplorer.application")
With IE
    .Visible = True
    .navigate "https://ukonlinestores.co.uk/amazon-uk-sellers/"

' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
    .readyState <> 4
    DoEvents
Loop

Set htmlTable = IE.document.getElementById("table_1")
Set collTD = htmlTable.getElementsByTagName("td")

RowCount = 1

For Each oNode In collTD
     ThisWorkbook.Worksheets(1).Range("A" & RowCount) = oNode.innerHTML
     RowCount = RowCount + 1
    
Next oNode

End With


End Sub

Upvotes: 2

Related Questions