Reputation: 43
I am working on a project to run some analytical models on NFL player stats. I have some code below that another user passed along to me. This code takes a list of links that I have on Sheet1, which is named "PlayerList", and creates a new tab for each player and pulls in their passing stats. All of the links are to Pro Football Reference. I am able to change this code to pull all necessary data for all positions other than quarterback. For the QBs I want to pull the passing stats table as well as the rushing and receiving stats table. Any help would be greatly appreciated. For reference here a few sample links:
https://www.pro-football-reference.com/players/R/RodgAa00.htm https://www.pro-football-reference.com/players/B/BreeDr00.htm
Below is the code:
Option Explicit
Public Sub GetInfo()
Di If InStr(links(link, 1), "https://") > 0 Then
Set html = GetHTMLDoc(links(link, 1))
Set hTable = html.getElementById("passing")
If Not hTable Is Nothing Then
playerName = GetNameAbbr(links(link, 1))
Set ws = AddPlayerSheet(playerName)
WriteTableToSheet hTable, ws
FixTable ws
End If
End If
Next
Application.ScreenUpdating = True
End Sub
Public Function GetHTMLDoc(ByVal url As String) As HTMLDocument
Dim sResponse As String, html As New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
html.body.innerHTML = sResponse
Set GetHTMLDoc = html
End Function
Public Sub WriteTableToSheet(ByVal hTable As HTMLTable, ByVal ws As Worksheet)
Dim x As Long, y As Long
With hTable
For x = 0 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
If y = 6 Or y = 7 Then
ws.Cells(x + 4, y + 1).Value = Chr$(39) & .Rows(x).Cells(y).innerText
Else
ws.Cells(x + 4, y + 1).Value = .Rows(x).Cells(y).innerText
End If
Next y
Next x
End With
End Sub
Public Function GetNameAbbr(ByVal url As String)
Dim tempArr() As String
tempArr = Split(url, "/")
GetNameAbbr = Left$(tempArr(UBound(tempArr)), 6)
End Function
Public Function AddPlayerSheet(ByVal playerName As String) As Worksheet
Dim ws As Worksheet
If SheetExists(playerName) Then
Application.DisplayAlerts = False
ThisWorkbook.Worksheets(playerName).Delete
Application.DisplayAlerts = True
End If
Set ws = ThisWorkbook.Worksheets.Add
ws.Name = playerName
Set AddPlayerSheet = ws
End Function
Public Function SheetExists(ByVal playerName As String) As Boolean
SheetExists = Evaluate("ISREF('" & playerName & "'!A1)")
End Function
Public Sub FixTable(ByVal ws As Worksheet)
Dim found As Range, numSummaryRows As Long
With ws
Set found = .Columns("A").Find("Career")
If found Is Nothing Then Exit Sub
numSummaryRows = .Cells(.Rows.Count, "A").End(xlUp).Row - found.Row
numSummaryRows = IIf(numSummaryRows = 0, 1, numSummaryRows + 1)
Debug.Print found.Offset(, 1).Resize(numSummaryRows, 30).Address, ws.Name
found.Offset(, 1).Resize(numSummaryRows, 30).Copy found.Offset(, 2)
found.Offset(, 1).Resize(numSummaryRows, 1).ClearContents
End With
End Subm html As New HTMLDocument, links(), link As Long, wsSourceSheet As Worksheet
Dim hTable As HTMLTable, ws As Worksheet, playerName As String
Set wsSourceSheet = ThisWorkbook.Worksheets("PlayerList")
Application.ScreenUpdating = False
With wsSourceSheet
links = .Range("C2:C" & .Cells(.Rows.Count, "C").End(xlUp).Row).Value
End With
For link = LBound(links, 1) To UBound(links, 1)
Upvotes: 2
Views: 1652
Reputation: 84475
Yes there is a reason for doing this with VBA. In fact at least five.....
Whilst there may be ways to handle this with inbuilt tools, I love me a bit of powerquery, it is no longer "out of the box", but requires knowing how to code in M to some extent and/or reverting to using some VBA anyway.
If you tie this to a button on a sheet you can easily press to refresh when you want, link it to a workbook_open event to refresh on opening, even have windows scheduler open the workbook and refresh at certain times (just so you know VBA still got your back! Though maybe with a little help from my friends ♫ aka Windows).
It seems XHR is just a little too fast for the lower tables on each page, but do not despair, you could use Internet Explorer, with a short delay to ensure the Rushing & Receiving
table is populated, or, as I have, use Selenium to automate the browser (I have used Chrome but Internet Explorer is possible). Although this is slower than XHR, we can be a little more efficient by running a headless browser instance.
Here you go with VBA which will give you the different tabs as you go and select only those tables required. Based on links in at C2 on sheet1.
Option Explicit
Public Sub GetInfo()
Dim d As New ChromeDriver
Dim html As New HTMLDocument, links(), link As Long, wsSourceSheet As Worksheet, clipboard As Object
Dim hTablePass As HTMLTable, hTableRushReceive As HTMLTable, ws As Worksheet, playerName As String
Set wsSourceSheet = ThisWorkbook.Worksheets("Sheet1") '<change to sheet containing links
Application.ScreenUpdating = False
With wsSourceSheet
If .Cells(.Rows.Count, "C").End(xlUp).Row = 2 Then
ReDim links(1 To 1, 1 To 1): links(1, 1) = .Range("C2")
Else
links = .Range("C2:C" & .Cells(.Rows.Count, "C").End(xlUp).Row).Value
End If
End With
For link = LBound(links, 1) To UBound(links, 1)
If InStr(links(link, 1), "https://") > 0 Then
With d
.AddArgument "--headless"
.get links(link, 1)
html.body.innerHTML = .PageSource
Set hTablePass = html.querySelector("#all_passing #passing")
Set hTableRushReceive = html.querySelector("#all_rushing_and_receiving #rushing_and_receiving")
playerName = GetNameAbbr(links(link, 1))
Set ws = AddPlayerSheet(playerName)
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
If Not hTablePass Is Nothing Then
clipboard.SetText Replace$(Replace$(hTablePass.outerHTML, "--></DIV>", vbNullString), "<!--", vbNullString)
clipboard.PutInClipboard
ws.Cells(GetLastRow(ws, 1), 1).PasteSpecial
End If
If Not hTableRushReceive Is Nothing Then
clipboard.SetText hTableRushReceive.outerHTML
clipboard.PutInClipboard
ws.Cells(GetLastRow(ws, 1) + 2, 1).PasteSpecial
End If
End With
End If
Next
Application.ScreenUpdating = True
End Sub
Public Function GetNameAbbr(ByVal url As String) As String
Dim tempArr() As String
tempArr = Split(url, "/")
GetNameAbbr = Left$(tempArr(UBound(tempArr)), 6)
End Function
Public Function AddPlayerSheet(ByVal playerName As String) As Worksheet
Dim ws As Worksheet
If SheetExists(playerName) Then
Application.DisplayAlerts = False
ThisWorkbook.Worksheets(playerName).Delete
Application.DisplayAlerts = True
End If
Set ws = ThisWorkbook.Worksheets.Add
ws.Name = playerName
Set AddPlayerSheet = ws
End Function
Public Function SheetExists(ByVal playerName As String) As Boolean '<== *@Rory
SheetExists = Evaluate("ISREF('" & playerName & "'!A1)")
End Function
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.Rows.Count, columnNumber).End(xlUp).Row
End With
End Function
References:
Selenium basic download:
*Function adapted from @Rory
Upvotes: 2
Reputation: 21657
Is there a reason you need to do this with VBA? Excel is quite capable of importing well-organized data such as the [several] tables on that page.
Under the Data tab, click From Web
and then enter the Website URL.
Next you will choose the table(s) that you want. Don't go nuts - only get what you need, but you can choose more than one tables by enabling the checkbox.
It can take a few long minutes to parse and organize all the data on the page...
Once you're back at the worksheet you'll see the queries on the right side. Right-click a query and choose Load To...
, then choose Table
and a location for the table data. There are a ton of other properties that you can customize; there are tutorials describing what you can do.
More things to customize are hidden in two ribbon tabs that only appear when you click on a table, Design and Query.
I think there's also a way to just create a list of players and then to use the Advanced
option when entering the URL to allow you to dynamically choose any player you want, while only adding the tables once... but I've never quite figured that part out yet.
I'm not a sports fan, but I assume the data will be changing throughout the season, and an advantage of using tables like this is that once you set up your worksheet how you want it, there are settings you can choose to auto-update every time you open the workbook, or on schedule, or manually, or never; whatever is appropriate.
Google "Excel web query" to find out more about the plethora of options available to you when using queries (aka: "Get & Transform") to extract and organize your data.
Perhaps this could be an alternative to consider instead of coding functionality that's already built-in to Excel.
Good luck, and "Go Sports!"
Upvotes: 4