Stupid_Intern
Stupid_Intern

Reputation: 3450

How to scrape data from the following table format VBA

I am trying to scrape all the table from start page to end contents from this Webpage

Using the code below I can scrape the table contents of page 1 but I don't know how can I modify the code to get the data from start page to end.

Option Explicit
Sub NBAStats()
 Dim IE As Object, obj As Object
 Dim r As Integer, c As Integer, t As Integer
 Dim elemCollection As Object
 Set IE = CreateObject("InternetExplorer.Application")

 With IE
 .Visible = True
 .navigate ("http://stats.nba.com/league/player/#!/")
While IE.ReadyState <> 4
DoEvents
Wend
 Do While IE.busy: DoEvents: Loop
 ThisWorkbook.Sheet1.Clear

 Set elemCollection = IE.Document.getElementsByTagName("TABLE")

    For t = 0 To (elemCollection.Length - 1)

        For r = 0 To (elemCollection(t).Rows.Length - 1)
            For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
                ThisWorkbook.Worksheets(1).Cells(r + 1, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
            Next c
        Next r
    Next t

 End With
 Set IE = Nothing
 End Sub

Upvotes: 0

Views: 4165

Answers (2)

Axel Richter
Axel Richter

Reputation: 61890

At first, in my opinion VBA automation of the Internet Explorer is highly instable and not really practicable in productive use-cases. This means also scraping data from web sites which are provided only for viewing within a browser is not really practicable in productive use-cases. If you are entitled to use those data then you should ask for another data source (XML or JSONfor example). If you are not entitled then you should not do that. Possible the provider of the web site does not agree with this.

To be clear, I'm talking about web sites like this, which provides it's data with JavaScript only. If the data would be within the HTML then you could get those data via XMLHTTP. This is another thing.

I will nevertheless provide a "solution". So you can't simply think "He is simply unable to do this, so he is saying you should not do that."

So you must analyze the site and pick out the elements you can click for navigation.

Option Explicit
Sub NBAStats()
 Dim IE As Object
 Dim r As Long, c As Long, t As Long, rSheet As Long, rStart As Long
 Dim bReady As Boolean
 Dim elementsTable As Object
 Dim elementsPageNavRigth As Object
 Dim elemPageNavRigth As Object
 Dim elementsTableDiv As Object

 ThisWorkbook.Worksheets(1).Cells.Clear

 Set IE = CreateObject("InternetExplorer.Application")

 With IE
  .Visible = True
  .navigate ("http://stats.nba.com/league/player/#!/")
  Do While IE.busy
   DoEvents
  Loop

  rSheet = 0

  Do
   Do While elementsTableDiv Is Nothing
    Set elementsTableDiv = IE.Document.getElementsByClassName("table-responsive")
    DoEvents
   Loop

   Do While elementsTableDiv(0) Is Nothing
    DoEvents
   Loop

   Set elementsPageNavRigth = IE.Document.getElementsByClassName("page-nav right")
   Set elemPageNavRigth = elementsPageNavRigth(0)

   If elemPageNavRigth.className = "page-nav right disabled" Then bReady = True

   'If rSheet = 0 Then rStart = 0 Else rStart = 1

   Set elementsTable = elementsTableDiv(0).getElementsByTagName("TABLE")
   For r = rStart To (elementsTable(0).Rows.Length - 1)
    For c = 0 To (elementsTable(0).Rows(r).Cells.Length - 1)
     ThisWorkbook.Worksheets(1).Cells(r + rSheet + 1, c + 1) = elementsTable(t).Rows(r).Cells(c).innerText
    Next c
   Next r

   rSheet = rSheet + r

   If Not elemPageNavRigth Is Nothing Then elemPageNavRigth.Click

   Set elementsTableDiv = Nothing

  Loop Until bReady Or elemPageNavRigth Is Nothing

 End With
 Set IE = Nothing
End Sub

Upvotes: 1

OfficeTricks.Com
OfficeTricks.Com

Reputation: 108

Try to find the Sitemap.xml of the website you are scraping. The sitemap.xml fill have all the links present in the webpage.

Import that xml file to your Excel Sheet, Read each link & fetch each table in it.

Upvotes: 1

Related Questions