Reputation: 53
Trying to understand how to interact with a website in a specific way. This is part of a larger code I'm working on that will loop through a list of ContractorIDs. What I need to do from here is the following:
Navigate to this website: https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=042786217&FilingYear=2018&nOrgPage=7&Year=2018
Find the link that says "UFR Filing with Audited Financials" and click on it. (if it's not there, end the sub)
On the ensuing page, find the link that is identified under "Document Category" as "UFR Excel Template" and click on it. (in this case, the link says "15-UFR18.xls", however since there's no consistent link naming scheme, the correct link will always have to be identified by the label under "Document Category" as mentioned. If the link doesn't exist, exit sub.)
On the ensuing page, click the "Download" link at the top and save the file under the following file path (which would be created at this time): C:\Documents\042786217\2018.
Edit: Code below gets me to the point where the download button is clicked, then I get the Open/Save/Cancel dialog box. Nearly there, just need to figure out how to save the file into a specific path.
Option Explicit
Sub UFRScraper()
If MsgBox("UFR Scraper will run now. Do you wish to continue?", vbYesNo) = vbNo Then Exit Sub
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Dim ele As Object
Dim tbl_Providers As ListObject: Set tbl_Providers = ThisWorkbook.Worksheets("tbl_ProviderList").ListObjects("tbl_Providers")
Dim FEIN As String: FEIN = ""
Dim FEINList As Range: Set FEINList = tbl_Providers.ListColumns("FEIN").DataBodyRange
Dim ProviderName As String: ProviderName = ""
Dim ProviderNames As Range: Set ProviderNames = tbl_Providers.ListColumns("Provider Name").DataBodyRange
Dim FiscalYear As String: FiscalYear = ""
Dim urlUFRDetails As String: urlUFRDetails = ""
Dim i As Integer
' Create InternetExplorer Object
Set IE = CreateObject("InternetExplorer.Application")
' Show (True)/Hide (False) IE
IE.Visible = True
i = 1
For i = 1 To 3 'Limited to 3 during testing. Change when ready.
FEIN = FEINList(i, 1)
ProviderName = ProviderNames(i, 1)
urlUFRDetails = "https://ufr.osd.state.ma.us/WebAccess/SearchDetails.asp?ContractorID=" & FEIN & "&FilingYear=2018&nOrgPage=1&Year=2018"
IE.Navigate urlUFRDetails
' Wait while IE loading...
'IE ReadyState = 4 signifies the webpage has loaded (the first loop is set to avoid inadvertently skipping over the second loop)
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
'Step 2 is done here
Dim filingFound As Boolean: filingFound = False
For Each ele In IE.Document.getElementsByTagName("a")
If ele.innerText = "UFR Filing with Audited Financials" Then
filingFound = True
IE.Navigate ele.href
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
Exit For
End If
Next ele
If filingFound = False Then
GoTo Skip
End If
'Step 3
Dim j As Integer: j = 0
Dim UFRFileFound As Boolean: UFRFileFound = False
For Each ele In IE.Document.getElementsByTagName("li")
j = j + 1
If ele.innerText = "UFR Excel Template" Then
UFRFileFound = True
IE.Navigate "https://ufr.osd.state.ma.us/WebAccess/documentviewact.asp?counter=" & j - 4
Do While IE.ReadyState = 4: DoEvents: Loop 'Do While
Do Until IE.ReadyState = 4: DoEvents: Loop 'Do Until
Exit For
End If
Next ele
If UFRFileFound = False Then
GoTo Skip
End If
'Step 4
IE.Document.getElementById("LinkButton2").Click
'**Built in wait time to avoid accidentally overloading server with repeated quick requests during development and testing**
Skip:
Application.Wait (Now + TimeValue("0:00:03"))
MsgBox "Loop " & i & " complete."
Next i
'Unload IE
IE.Quit
Set IE = Nothing
Set objElement = Nothing
Set objCollection = Nothing
MsgBox "Process complete!"
End Sub
Upvotes: 1
Views: 646
Reputation: 2777
I have tried step 3 with some what lengthy way. but could not provide complete download code as (after one successful manual attempt) at present even manual download attempt causing massage "The File Could Not Be Retrieved" (maybe server side constrain)
Code only take you down to the cell containing href
of the xlx file
Dim doc As HTMLDocument
Dim Tbl As HTMLTable, Cel As HTMLTableCell, Rw As HTMLTableRow, Col As HTMLTableCol
Set doc = IE.document
For Each ele In IE.document.getElementsByClassName("boxedContent")
For Each Tbl In ele.getElementsByTagName("table")
For Each Rw In Tbl.Rows
For Each Cel In Rw.Cells
'Debug.Print Cel.innerText
If InStr(1, Cel.innerText, "UFR Excel Template") > 0 Then
Debug.Print Rw.Cells(2).innerText & " - " & Rw.Cells(2).innerHTML
End If
Next
Next Rw
Next Tbl
Next
Once the href
is available PtrSafe
Function or WinHTTPrequest
or other methods could be used to download the file. Welcome and eager to learn some more efficient answers in this case from experts like @QHarr and others.
Upvotes: 1