Patrick Kennedy
Patrick Kennedy

Reputation: 23

VBA Web Scrape Get Elements by Class Name Type Mismatch Error

I am trying to pull some data from NHL stats into Excel with VBA using the following code, but am getting a type mismatch error. Any ideas?

Code:

Private Sub Hawks()

    Dim IE As New InternetExplorer
    Dim element As HTMLAnchorElement
    Dim elements As HTMLElementCollection

    IE.Visible = False
    IE.navigate "https://www.nhl.com/blackhawks/stats"

    Do
    DoEvents
    Loop Until IE.readyState = READYSTATE_COMPLETE

    Dim Doc As HTMLDocument
    Set Doc = IE.document
    Set elements = Doc.getElementsByClassName("name-col__firstName")

    Dim count As Long

    Dim erow As Long
    count = 0

    For Each element In sDD
    If element.className = "name-col__firstName" Then
    erow = Sheet1.Cells(Rows.count, 1).edn(xlUp).Offset(1, 0).Row
    Cells(erow, 1) = HTML.getElementsByTagName("span")(count).innerText
    count = count + 1
    End If
    Next element

End Sub

Upvotes: 2

Views: 6388

Answers (2)

QHarr
QHarr

Reputation: 84465

Your code:

Perhaps you merged separate bits of code but you have inconsistent use of variables. sDD I think should be elements, HTML should be Doc. The associated variable type declarations for element and elements should be:

Dim element As IHTMLSpanElement
Dim elements As IHTMLElementCollection

If elements is a collection all with the same class name then you don't need:

If element.className = "name-col__firstName" 

There are 40 matching elements for this class on the page, some of which repeat the same info as you haven't restricted to an individual table.

You are attempting to index into a span tag collection using the same index from your class name collection, but the span collection is in fact 1354 elements in length and the indices do not correspond on the page.

You want to target the table of interest only and the elements within it. I show you how later.

You also have a typo on this line:

erow = Sheet1.Cells(Rows.count, 1).edn(xlUp).Offset(1, 0).Row

It should be End(xlUp) .


Just names:

If you are only after the name info then I would use a descendant CSS combinator to target the table by its parent div element id, and then the actual names by the value of their class attribute. It is a fast method with a little further code optimization by switching off screenupdating.

The names all sit in a table whose id is skater-table. The CSS selector for that is #skater-table. The # denotes id. The names themselves, within this parent table id element all have a class attribute which contains the string value text. That is written as a CSS attribute = value selector of [class*=text]. The * denotes that the class name value contains the value text.

You can see a sample of the matched elements here:


VBA: Full name list.

Option Explicit
Public Sub GetHawksNamesInfo()
    Dim IE As InternetExplorer, playerList As Object, player As Long
    Application.ScreenUpdating = False
    Set IE = New InternetExplorer
    With IE
        .Visible = False
        .navigate "https://www.nhl.com/blackhawks/stats"
        While .Busy Or .readyState < 4: DoEvents: Wend

        Set playerList = IE.document.querySelectorAll("#skater-table [class*=text]")

        With ThisWorkbook.Worksheets("Sheet1")
            For player = 0 To playerList.Length - 1
                .Cells(player + 1, 1) = playerList.item(player).innerText
            Next
        End With
        .Quit
    End With
    Application.ScreenUpdating = True
End Sub

Entire Table:

You can grab the entire table, along with player pics, by copying to the clipboard and then pasting to the sheet using:

Option Explicit
Public Sub GetInfo()
    Dim IE As InternetExplorer, clipboard As Object
    Application.ScreenUpdating = False
    Set IE = New InternetExplorer
    With IE
        .Visible = False
        .navigate "https://www.nhl.com/blackhawks/stats"
        While .Busy Or .readyState < 4: DoEvents: Wend

        Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
        clipboard.SetText .document.querySelector("#skater-table table").outerHTML
        clipboard.PutInClipboard
        ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial

        .Quit
    End With
    Application.ScreenUpdating = True
End Sub

API:

The real gold mine for stats nerds is the API though. Whilst inspecting the HTML for the page I found this script which detailed client side values made available by an API. It seems likely that there was therefore a queryString based API call being made. In layman's terms, a set of values you can combine into a string, which you send to a web interface, that serves up a response containing all the stats data, in this case, in a format called JSON. APIs are generally great ways for clients to get data and is more reliable than web-scraping.

I decided to monitor the web traffic to see if an API call was made that I could grab. Bazinga! The following queryString based API call was made which returns a JSON response.

https://statsapi.web.nhl.com/api/v1/teams/16?hydrate=franchise(roster(season=20182019,person(name,stats(splits=[yearByYear]))))

Note: If you paste the above string into a FireFox browser and hit enter you can browse the JSON response.

Scrolling down in FireFox you can find Jersey number 19, for example, and view their info:


This exposes an enormous wealth of stats info returned as a JSON string. Here is just a peek at what is contained within (this isn't even all the info for the one player shown!):

XMLHTTP API call and JSON parsing:

You can avoid opening a browser altogether and issue a really fast XMLHTTP request against the API and get all this info in a JSON response you can then work with using a JSONParser.

There is simply too much info in the JSON to show you how to parse it all. Here is just an example of parsing all names from the response (note this is a full season list). After downloading and importing JSONConverter.bas from the link given, you need to go VBE > Tools > References > Add a reference to Microsoft Scripting Runtime.

Option Explicit
Public Sub GetInfo()
    Dim strJSON As String, json As Object
    Const URL  As String = "https://statsapi.web.nhl.com/api/v1/teams/16?hydrate=franchise(roster(season=20182019,person(name,stats(splits=[yearByYear]))))"

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", URL, False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        strJSON = .responseText
    End With
    Set json = JsonConverter.ParseJson(strJSON)("teams")(1)("franchise")("roster")("roster")
    Dim player As Object
    For Each player In json
        Debug.Print player("person")("fullName")
    Next
End Sub

Edit: There now seems to be problems with late bound clipboard reference in some cases. Here is generic early bound method where hTable is the target HTMLTable object.

For clipboard early bound go VBE > Tools > References > Microsoft-Forms 2.0 Object Library.

If you add a UserForm to your project, the library will get automatically added.

Dim clipboard As DataObject
Set clipboard = New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial

Upvotes: 5

K.Dᴀᴠɪs
K.Dᴀᴠɪs

Reputation: 10139

This method loops the rows of your table rather than the Class Collection name-col__firstName.

I've test this and it appears to work.

Option Explicit

Private Sub Hawks()

    Dim IE As New InternetExplorer

    IE.Visible = False
    IE.navigate "https://www.nhl.com/blackhawks/stats"

    With IE
        Do While .Busy or .ReadyState < 4
            DoEvents
        Loop
    End With

    Dim doc As HTMLDocument
    Dim eRow As Long
    Dim htmlTbl As HTMLTable, tblRow As HTMLTableRow
    Set doc = IE.document
    Set htmlTbl = doc.getElementById("skater-table").getElementsByTagName( _
            "table")(0)

    For Each tblRow In htmlTbl.Rows
        If tblRow.RowIndex > 0 Then ' Skipping the table header
            eRow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
            Cells(eRow, 1) = tblRow.getElementsByTagName("span")(0).innerText
        End If
    Next tblRow

    Rem: You may want to consider adding ie.quit

End Sub

Upvotes: 2

Related Questions