Marc Fernandes
Marc Fernandes

Reputation: 53

Need Excel VBA to navigate website and download specific files

Trying to understand how to interact with a website in a specific way. This is part of a larger code I'm working on that will loop through a list of ContractorIDs. What I need to do from here is the following:

  1. Navigate to this website: https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=042786217&FilingYear=2018&nOrgPage=7&Year=2018

  2. Find the link that says "UFR Filing with Audited Financials" and click on it. (if it's not there, end the sub)

  3. On the ensuing page, find the link that is identified under "Document Category" as "UFR Excel Template" and click on it. (in this case, the link says "15-UFR18.xls", however since there's no consistent link naming scheme, the correct link will always have to be identified by the label under "Document Category" as mentioned. If the link doesn't exist, exit sub.)

  4. On the ensuing page, click the "Download" link at the top and save the file under the following file path (which would be created at this time): C:\Documents\042786217\2018.

Edit: Code below gets me to the point where the download button is clicked, then I get the Open/Save/Cancel dialog box. Nearly there, just need to figure out how to save the file into a specific path.

Option Explicit
Sub UFRScraper()

    If MsgBox("UFR Scraper will run now. Do you wish to continue?", vbYesNo) = vbNo Then Exit Sub

    Dim IE As Object
    Dim objElement As Object
    Dim objCollection As Object
    Dim ele As Object
    Dim tbl_Providers As ListObject: Set tbl_Providers = ThisWorkbook.Worksheets("tbl_ProviderList").ListObjects("tbl_Providers")
    Dim FEIN As String: FEIN = ""
    Dim FEINList As Range: Set FEINList = tbl_Providers.ListColumns("FEIN").DataBodyRange
    Dim ProviderName As String: ProviderName = ""
    Dim ProviderNames As Range: Set ProviderNames = tbl_Providers.ListColumns("Provider Name").DataBodyRange
    Dim FiscalYear As String: FiscalYear = ""
    Dim urlUFRDetails As String: urlUFRDetails = ""
    Dim i As Integer

    ' Create InternetExplorer Object
    Set IE = CreateObject("InternetExplorer.Application")

    ' Show (True)/Hide (False) IE
    IE.Visible = True

    i = 1
    For i = 1 To 3 'Limited to 3 during testing. Change when ready.
        FEIN = FEINList(i, 1)
        ProviderName = ProviderNames(i, 1)

        urlUFRDetails = "https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=" & FEIN & "&FilingYear=2018&nOrgPage=1&Year=2018"

        IE.Navigate urlUFRDetails

        ' Wait while IE loading...
        'IE ReadyState = 4 signifies the webpage has loaded (the first loop is set to avoid inadvertently skipping over the second loop)
        Do While IE.ReadyState = 4: DoEvents: Loop   'Do While
        Do Until IE.ReadyState = 4: DoEvents: Loop   'Do Until


        'Step 2 is done here
        Dim filingFound As Boolean: filingFound = False
        For Each ele In IE.Document.getElementsByTagName("a")
            If ele.innerText = "UFR Filing with Audited Financials" Then
                filingFound = True
                IE.Navigate ele.href
                Do While IE.ReadyState = 4: DoEvents: Loop   'Do While
                Do Until IE.ReadyState = 4: DoEvents: Loop   'Do Until
                Exit For
            End If
        Next ele

        If filingFound = False Then
            GoTo Skip
        End If


        'Step 3
        Dim j As Integer: j = 0
        Dim UFRFileFound As Boolean: UFRFileFound = False
        For Each ele In IE.Document.getElementsByTagName("li")
            j = j + 1
            If ele.innerText = "UFR Excel Template" Then
                UFRFileFound = True
                IE.Navigate "https://ufr.osd.state.ma.us/WebAccess/documentviewact.asp?counter=" & j - 4
                Do While IE.ReadyState = 4: DoEvents: Loop   'Do While
                Do Until IE.ReadyState = 4: DoEvents: Loop   'Do Until
                Exit For
            End If
        Next ele

        If UFRFileFound = False Then
            GoTo Skip
        End If


        'Step 4
        IE.Document.getElementById("LinkButton2").Click

        '**Built in wait time to avoid accidentally overloading server with repeated quick requests during development and testing**
Skip:
        Application.Wait (Now + TimeValue("0:00:03"))
        MsgBox "Loop " & i & " complete."

    Next i

    'Unload IE
    IE.Quit
    Set IE = Nothing
    Set objElement = Nothing
    Set objCollection = Nothing

    MsgBox "Process complete!"

End Sub

Upvotes: 1

Views: 646

Answers (1)

Ahmed AU
Ahmed AU

Reputation: 2777

I have tried step 3 with some what lengthy way. but could not provide complete download code as (after one successful manual attempt) at present even manual download attempt causing massage "The File Could Not Be Retrieved" (maybe server side constrain)

Code only take you down to the cell containing href of the xlx file

 Dim doc As HTMLDocument
        Dim Tbl As HTMLTable, Cel As HTMLTableCell, Rw As HTMLTableRow, Col As HTMLTableCol
        Set doc = IE.document

        For Each ele In IE.document.getElementsByClassName("boxedContent")
            For Each Tbl In ele.getElementsByTagName("table")
               For Each Rw In Tbl.Rows
                    For Each Cel In Rw.Cells
                    'Debug.Print Cel.innerText
                        If InStr(1, Cel.innerText, "UFR Excel Template") > 0 Then
                        Debug.Print Rw.Cells(2).innerText & " - " & Rw.Cells(2).innerHTML
                        End If
                    Next
               Next Rw
            Next Tbl
        Next

Once the href is available PtrSafe Function or WinHTTPrequest or other methods could be used to download the file. Welcome and eager to learn some more efficient answers in this case from experts like @QHarr and others.

Upvotes: 1

Related Questions