Elvis
Elvis

Reputation: 425

Scrape data from website with multiple pages

https://finviz.com/screener.ashx?v=152&f=cap_midover&c=1,16,17,18,65

I want to scrape the data from the website above using VBA so that I can obtain 5 columns that I want (Ticker, EPS, EPS this Y, EPS next Y, Price). There are 99 pages need to loop through and each pages have 20 tickers, which means I need to scrape almost 2000 rows of data. I'm able to do this by using PowerQuery but seems like it takes around 3min to refresh the data if I'm using powerquery.

I'm not sure if I use VBA to scrape the data would be able to speed up the time taken for the data to refresh or not. I'm new to VBA and below is my code which give me an output of whole websites pages (not what I want) and the code doesn't loop through different pages from 1-99.

Sub GetFinvizData()
 
Dim str As String
 
'Delete existing data
Sheets("Data").Activate 'Name of sheet the data will be downloaded into. Change as required.
Range("A1").Select
Range(Selection, Selection.End(xlToRight)).Select
Range(Selection, Selection.End(xlDown)).Select
Selection.ClearContents
 
'Download stock quotes. Be patient - takes a few seconds.
str = "https://finviz.com/screener.ashx?v=152&f=cap_midover&r=1&c=1,16,17,18,65"
QueryQuote:
            With Sheets("Data").QueryTables.Add(Connection:="URL;" & str, Destination:=Sheets("Data").Range("a1"))
                .BackgroundQuery = True
                .TablesOnlyFromHTML = False
                .refresh BackgroundQuery:=False
                .SaveData = True
            End With
 
Sheets("Data").Range("a1").CurrentRegion.TextToColumns Destination:=Sheets("Data").Range("a1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
Semicolon:=False, Comma:=True, Space:=False, other:=True, OtherChar:=",", FieldInfo:=Array(1, 2)
 
Sheets("Data").Columns("A:B").ColumnWidth = 12
Range("A1").Select
 
End Sub

Upvotes: 0

Views: 981

Answers (2)

SIM
SIM

Reputation: 22440

Try the following to get your aforesaid fields across all the pages from that site:

Option Explicit
Sub FetchTabularData()
    Const base$ = "https://finviz.com/"
    Dim elem As Object, S$, R&, oPage As Object, nextPage$
    Dim Http As Object, Html As Object, ws As Worksheet, Url$
    
    Set ws = ThisWorkbook.Worksheets("Data")
    Set Http = CreateObject("MSXML2.XMLHTTP")
    Set Html = CreateObject("HTMLFile")
    
    Url = "https://finviz.com/screener.ashx?v=152&f=cap_midover&r=1&c=1,16,17,18,65"
    
    ws.Range("A1:E1") = Array("Ticker", "EPS", "EPS This Y", "EPS Next Y", "Price")
    
    R = 1
    
    While Url <> ""
        With Http
            .Open "GET", Url, False
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
            .send
            S = .responseText
        End With
    
        With Html
            .body.innerHTML = S
            For Each elem In .getElementById("screener-content").getElementsByTagName("tr")
                If InStr(elem.className, "table-dark-row-cp") > 0 Or InStr(elem.className, "table-light-row-cp") > 0 Then
                    R = R + 1: ws.Cells(R, 1) = elem.Children(0).innerText
                    ws.Cells(R, 2) = elem.Children(1).innerText
                    ws.Cells(R, 3) = elem.Children(2).innerText
                    ws.Cells(R, 4) = elem.Children(3).innerText
                    ws.Cells(R, 5) = elem.Children(4).innerText
                End If
            Next elem
            
            Url = vbNullString
            
            For Each oPage In .getElementsByTagName("a")
                If InStr(oPage.className, "tab-link") And InStr(oPage.innerText, "next") > 0 Then
                    nextPage = oPage.getAttribute("href")
                    Url = base & Replace(nextPage, "about:", "")
                End If
            Next oPage
        End With
    Wend
End Sub

You don't need to add anything to the reference library to execute the above script.

Upvotes: 1

Akhil Hothi
Akhil Hothi

Reputation: 97

It's my fourth day while learning vba, so don't expect much...Also I have no idea how to loop through different pages and get data into your sheet, So this is not going to solve your problem... but...

Still I think I should propose my thought, it's just my opinion. If you are going to make different sheets for each pg then you may use the code given below to delete the junk content which you don't need. I think junk will be limited to particular range so you can delete it after it comes into the sheet... Still this code won't get all pages into different sheets if you will be able to do it then this can be done next.

Sub GetFinvizData()
Application.DisplayAlerts = False
Dim str As String
 
'Delete existing data
ActiveSheet.Activate 'Name of sheet the data will be downloaded into. Change as required.
Range("A1").Select
Range(Selection, Selection.End(xlToRight)).Select
Range(Selection, Selection.End(xlDown)).Select
Selection.ClearContents
 
'Download stock quotes. Be patient - takes a few seconds.
str = "https://finviz.com/screener.ashx?v=152&f=cap_midover&r=1&c=1,16,17,18,65"
QueryQuote:
            With ActiveSheet.QueryTables.Add(Connection:="URL;" & str, Destination:=ActiveSheet.Range("a1"))
                .BackgroundQuery = True
                .TablesOnlyFromHTML = False
                .Refresh BackgroundQuery:=False
                .SaveData = True
            End With
 
ActiveSheet.Range("a1").CurrentRegion.TextToColumns Destination:=ActiveSheet.Range("a1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
Semicolon:=False, Comma:=True, Space:=False, other:=True, OtherChar:=",", FieldInfo:=Array(1, 2)
 
ActiveSheet.Columns("A:B").ColumnWidth = 12

' DeleteJunk
'

'

    Rows("1:20").Select
    Range("1:20,42:58").Select
    Selection.Delete Shift:=xlUp

Range("A1").Select
 
End Sub

Upvotes: 0

Related Questions