bdbart
bdbart

Reputation: 1

Web-scraping from Excel List of PDGA Numbers using VBA

I have a list of numbers (PDGA Numbers) in MS Excel. I would like to automatically search the PDGA website (https://www.pdga.com/players/) from the list and automatically paste the player's location next to the corresponding PDGA Number. Currently, I am able to search the number and paste the location individually, but not the entire list.

First I select an excel cell and 'Define Name' as PDGA, and another as Location. https://i.sstatic.net/oRBw1.jpg

Then I basically followed this YouTube video. https://www.youtube.com/watch?v=7sZRcaaAVbg

And ultimately got this VBA code to work. (Make sure the proper VBA References are checked) https://i.sstatic.net/ADDcv.jpg

Private Sub Worksheet_Change(ByVal Target As Range)
 If Target.Column = Range("PDGA").Column Then
 Dim IE As New InternetExplorer
    IE.Visible = False
            IE.navigate "https://www.pdga.com/player/" & Range("PDGA").Value
    Do
    DoEvents
    Loop Until IE.readyState = READYSTATE_COMPLETE
    Dim Doc As HTMLDocument
    Set Doc = IE.document
    Dim sDD As String
    sDD = Trim(Doc.getElementsByTagName("li")(127).outerText)
    Range("Location").Value = sDD

    End If
   End Sub

I think I need some For Each loop, but I'm not sure. It should look like this when completed. https://i.sstatic.net/JzdrX.jpg

Thanks in advance for any help.

Upvotes: 0

Views: 131

Answers (3)

SIM
SIM

Reputation: 22440

You can achieve your desired output in several ways. Here is one of such.

Sub FetchData()
    Dim Http As New XMLHTTP60, Html As New HTMLDocument
    Dim post As HTMLDivElement, Idic As New Scripting.Dictionary
    Dim key As Variant, N$, CT$, S$, C$, R&

    With Http
        .Open "GET", "https://www.pdga.com/players/", False
        .send
        Html.body.innerHTML = .responseText
    End With

    For Each post In Html.querySelector("table.views-table tbody").getElementsByTagName("tr")
        N = post.querySelector("a[title]").innerText
        CT = post.querySelector(".city").innerText
        S = post.querySelector(".state").innerText
        C = post.querySelector(".country").innerText
        Idic(N & "|" & CT & " " & S & " " & C) = 1
    Next post

    For Each key In Idic.Keys
        R = R + 1: Cells(R, 1) = Split(key, "|")(0)
        Cells(R, 2) = Split(key, "|")(1)
    Next key
End Sub

Reference to add to the library:

Microsoft XML, v6.0
Microsoft HTML Object Library
Microsoft Scripting Runtime

Upvotes: 1

QHarr
QHarr

Reputation: 84465

If you have a specific list of players then you loop and issue XHR requests to get the info. Here I have the PDGA# in an array which is looped:

playerPDGA = Array(1, 5, 23, 46, 789, 567)

Code:

Option Explicit
Public Sub GetInfo()
    Dim sResponse As String, html As New HTMLDocument, playerPDGA(), results(), i As Long
    playerPDGA = Array(1, 5, 23, 46, 789, 567)
    ReDim results(0 To UBound(playerPDGA), 0 To 1)
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(playerPDGA) To UBound(playerPDGA)
            .Open "GET", "https://www.pdga.com/player/" & playerPDGA(i), False
            .send
            sResponse = StrConv(.responseBody, vbUnicode)
            sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
            With html
                .body.innerHTML = sResponse
                results(i, 0) = .querySelector(".pane-content > h1").innerText
                results(i, 1) = .querySelector(".location").innerText
            End With
        Next i
    End With
    ActiveSheet.Cells(1, 1).Resize(UBound(results, 1) + 1, UBound(results, 2) + 1) = results
End Sub

For any page listing players:

Option Explicit
Public Sub GetInfo()
    Dim sResponse As String, html As New HTMLDocument
    Application.ScreenUpdating = False
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.pdga.com/players/", False
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With

    sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
    Dim nameList As Object, cityList As Object, stateList As Object, countryList As Object, r As Long
    With html
        .body.innerHTML = sResponse
        Set nameList = .querySelectorAll(".views-field.views-field-nothing")
        Set cityList = .querySelectorAll(".views-field.views-field-City.city")
        Set stateList = .querySelectorAll(".views-field.views-field-StateProv.state")
        Set countryList = .querySelectorAll(".views-field.views-field-Country.country")

    End With
    With ActiveSheet
        Dim i As Long
        For i = 0 To nameList.Length - 1
            r = r + 1
            .Cells(r, 1) = nameList.item(i).innerText
            .Cells(r, 2) = Trim$(cityList.item(i).innerText & Chr$(32) & stateList.item(i).innerText & Chr$(32) & countryList.item(i).innerText)
        Next i
    End With
    Application.ScreenUpdating = True
End Sub

Reference:

HTML Object library

Upvotes: 2

Santosh
Santosh

Reputation: 12353

enter image description here

Sub test()

    Dim IE As New InternetExplorer
    Dim Doc As HTMLDocument
    Dim lastRow As Long, i As Long
    Dim sDD As String

    IE.Visible = False

    lastRow = Range("A" & Rows.Count).End(xlUp).Row
    For i = 1 To lastRow

        IE.navigate "https://www.pdga.com/player/" & Range("PDGA").Cells(i).Value

        Do
        DoEvents
        Loop Until IE.readyState = READYSTATE_COMPLETE

        Set Doc = IE.document

        sDD = Trim(Doc.getElementsByTagName("li")(127).outerText)
        Range("Location").Cells(i) = sDD
    Next

     Set IE = Nothing
     Set Doc = Nothing
End Sub

Upvotes: 0

Related Questions