Stupid_Intern
Stupid_Intern

Reputation: 3450

How to extract table values from a Website

https://www.morningstar.com/stocks/xnas/ATVI/price-fair-value

I am trying to extract the total returns data year wise

28.59   13.32   0.36    -12.34  69.68   14.13   93.25   -6.04   76.18   -25.92  3.99

from the above site. I need help regarding how to proceed next.

Option Explicit

Sub genOP()

Dim i As Long, fI As Long
Dim tickeR As String
Dim urlStr As String
Dim ie As New InternetExplorer

With INP
    fI = .Range("A" & .Rows.Count).End(xlUp).Row

    For i = 2 To fI
        tickeR = Trim(.Range("A" & i))
        Debug.Print tickeR
        'https://www.morningstar.com/stocks/xnas/abmd/price-fair-value'
        urlStr = "https://www.morningstar.com/stocks/xnas/" & tickeR & "/price-fair-value"
        Debug.Print urlStr
        With ie
            .Visible = True
            .navigate urlStr

            While .readyState <> 4
                DoEvents
            Wend
            Stop
            .document.getElementsByClassName("annual-data-row ng-scope").innerText

        End With
    Next i

End With

End Sub

Upvotes: 3

Views: 743

Answers (2)

QHarr
QHarr

Reputation: 84465

If you observe the web traffic you will see the page does an API xhr request for that chart info which is returned as json.

enter image description here

Examining the request sent we see the following:

enter image description here

The request url itself e.g.

https://api-global.morningstar.com/sal-service/v1/stock/priceFairValue/v2/0P00000053/data?secExchangeList=

The string 0P00000053 is a unique identifier for that ticker; I later refer to this as share_id (placeholder) and shareId variable. It is obtained from a request to the actual ticker page.

The query string param has no value and can be ignored. The important headers are highlighted. See key for explanation of importance.

The two red boxed headers uniquely identify a specific ticker uri.

The two blue require access keys provided in a js file.

You can make an initial request to get these access keys; an additional request to get a list of all tickers - I use the nasdaq 100 as source; or provide your own ticker list (examples given of each - comment lines out as required).

Own list use:

tickers = Array("ATVI") ''etc....extend

Nasdaq 100 list use:

tickers = GetNasdaqTickers(xhr, html)  

Requests to each ticker page must be made to retrieve the unique identifiers (contentId for "X-SAL-ContentType" header and shareId for API url) and then headers updated accordingly:

tickerName = tickers(ticker)
url = Replace$("https://www.morningstar.com/stocks/xnas/{ticker}/price-fair-value", "{ticker}", tickerName)
headersDict("Referer") = url
Set ids = GetContentIdShareId(xhr, url, re) 'Set up correct ids for each indiv request
headersDict("X-SAL-ContentType") = ids("contentId")

The API call is updated during a loop over tickers and the chart info parsed out using a json parser. I would use jsonconverter.bas to parse the json. Install the code from that link in a standard module called JsonConverter. All required project references are shown at top of code.

For ticker = LBound(tickers) To UBound(tickers)
   'other code
    nasdaqDict.Add tickerName, GetChartData(xhr, ids("shareId"), headersDict)
Next     

The function GetChartData returns a dictionary which has the chart dates as keys and chart values as values. Each returned dictionary, for a given ticker, is added to a parent dictionary, nasdaqDict. nasdaqDict has the ticker names as keys and associated chart dictionaries as values.

At the end, this parent dictionary is looped and all values written out to sheet by WriteOutDict.

You can explore the dictionary of dictionaries, nasdaqDict, here.


VBA Code:

Option Explicit

'VBE > Tools > References:
' Microsoft HTML Object Library
' Microsoft XML ,vn.0 e.g. Microsoft XML ,v6.0
' Microsoft VBScript Regular Expressions n.n e.g. Microsoft VBScript Regular Expressions 5.5
' Microsoft Scripting Runtime

Public Sub GetNasdaq100ChartValues()

    Dim re As VBScript_RegExp_55.RegExp, html As mshtml.HTMLDocument, xhr As MSXML2.XMLHTTP60
    Dim nasdaqDict As Scripting.Dictionary

    Set re = New VBScript_RegExp_55.RegExp
    Set html = New mshtml.HTMLDocument
    Set xhr = New MSXML2.XMLHTTP60

    '##Set-up **************************************************************************************************

    Dim headersDict As Scripting.Dictionary

    Set headersDict = New Scripting.Dictionary
    headersDict.Add "User-Agent", "Mozilla/5.0"
    headersDict.Add "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
    headersDict.Add "Referer", vbNullString
    headersDict.Add "ApiKey", vbNullString
    headersDict.Add "X-API-REALTIME-E", vbNullString
    headersDict.Add "X-SAL-ContentType", vbNullString

    Set headersDict = UpdateHeaders(xhr, re, headersDict)
    Set nasdaqDict = New Scripting.Dictionary    'This will be a dictionary of dictionaries with keys as ticker names _
                                                 and values as dictionaries containing the associated chart dates as keys and values as values.

    Dim ids As Scripting.Dictionary, tickerName As String, tickers(), ticker As Long, url As String

    '## This gets all nasdaq tickers (from https://www.cnbc.com/nasdaq-100/) and populates tickers with these.
    '## You could instead replace this with a manually supplied list of desired tickers e.g.
    tickers = Array("ATVI") ''etc....extend
    'tickers = GetNasdaqTickers(xhr, html)        ''comment this line out if passing hardcoded ticker values

    '##Get info ************************************************************************************************

    For ticker = LBound(tickers) To UBound(tickers)
        tickerName = tickers(ticker)
        url = Replace$("https://www.morningstar.com/stocks/xnas/{ticker}/price-fair-value", "{ticker}", tickerName)
        headersDict("Referer") = url
        Set ids = GetContentIdShareId(xhr, url, re) 'Set up correct ids for each indiv request
        headersDict("X-SAL-ContentType") = ids("contentId")
        nasdaqDict.Add tickerName, GetChartData(xhr, ids("shareId"), headersDict) 'make indiv API call for current ticker
    Next

    WriteOutDict nasdaqDict
End Sub

Public Function UpdateHeaders(ByVal xhr As MSXML2.XMLHTTP60, ByVal re As VBScript_RegExp_55.RegExp, ByVal headersDict As Scripting.Dictionary) As Scripting.Dictionary
    Dim s As String, accessKeys As VBScript_RegExp_55.MatchCollection
    Dim apiKey As String, apiRealtimeKey As String

    With xhr                                     'Make request to get keys from js file
        .Open "GET", "https://www.morningstar.com/assets/quotes/1.3.0/js/sal-components-wrapper.js", False
        .send
        s = .responseText
    End With
    With re
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "apigee:""(.*?)""|realtime:""(.*?)""" 'regex pattern to return both api key values
        Set accessKeys = .Execute(s)
    End With

    apiKey = accessKeys.item(0).SubMatches(0)
    apiRealtimeKey = accessKeys.item(1).SubMatches(1)
    headersDict("ApiKey") = apiKey
    headersDict("X-API-REALTIME-E") = apiRealtimeKey

    Set UpdateHeaders = headersDict
End Function

Public Function GetNasdaqTickers(ByVal xhr As MSXML2.XMLHTTP60, ByVal html As HTMLDocument) As Variant
    Dim tickers As Object, results(), i As Long

    With xhr
        .Open "GET", "https://www.cnbc.com/nasdaq-100/", False
        .send
        html.body.innerHTML = .responseText
    End With

    Set tickers = html.querySelectorAll(".quoteTable a")

    ReDim results(0 To tickers.Length - 1)

    For i = 0 To tickers.Length - 1
        results(i) = tickers.item(i).innerText
    Next
    GetNasdaqTickers = results
End Function

Public Function GetContentIdShareId(ByVal xhr As MSXML2.XMLHTTP60, ByVal url As String, ByVal re As VBScript_RegExp_55.RegExp) As Scripting.Dictionary
    Dim ids As Scripting.Dictionary, s As String

    Set ids = New Scripting.Dictionary

    With xhr                                     'Make request to get keys from js file
        .Open "GET", url, False
        .send
        s = .responseText
    End With
    With re
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "contentType:""(.*?)"",profile" 'regex pattern to get contentId
        ids("contentId") = .Execute(s)(0).SubMatches(0)
        .Pattern = "byId:{""(.*?)"""             'regex pattern to get shareId
        ids("shareId") = .Execute(s)(0).SubMatches(0)
    End With
    Set GetContentIdShareId = ids
End Function

Public Function GetChartData(ByVal xhr As MSXML2.XMLHTTP60, ByVal shareId As String, ByVal headersDict As Scripting.Dictionary) As Scripting.Dictionary
    Dim key As Variant, chartValues As Scripting.Dictionary, i As Long, json As Object
    Set chartValues = New Scripting.Dictionary
    With xhr
        .Open "GET", Replace$("https://api-global.morningstar.com/sal-service/v1/stock/priceFairValue/v2/{share_id}/data", "{share_id}", shareId), False
        For Each key In headersDict.keys
            .setRequestHeader key, headersDict(key)
        Next
        .send
        Set json = JsonConverter.ParseJson(.responseText)
    End With
    For i = 1 To json("columnDefs").Count        'collection
        chartValues(json("columnDefs")(i)) = json("table")("rows")(2)("datum")(i)
    Next
    Set GetChartData = chartValues
End Function

Public Sub WriteOutDict(ByVal nasdaqDict As Scripting.Dictionary)
    Dim key As Variant, row(), r As Long, headers()

    Application.ScreenUpdating = False

    headers = nasdaqDict(nasdaqDict.keys(0)).keys 'assumption that charts show for same time period for all tickers

    r = 2

    With ThisWorkbook.Worksheets("Sheet1")
        .Cells(1, 1) = "Ticker"
        .Cells(1, 2).Resize(1, UBound(headers) + 1) = headers
        For Each key In nasdaqDict.keys
            row = nasdaqDict(key).items
            .Cells(r, 1) = key
            .Cells(r, 2).Resize(1, UBound(row) + 1) = row
            r = r + 1
        Next
    End With
    Application.ScreenUpdating = True
End Sub

Sample of results:

enter image description here


Python:

I wrote with python initially in case of interest:

import requests, re
from bs4 import BeautifulSoup as bs

def get_chart_data(share_id):
    r = s.get(f'https://api-global.morningstar.com/sal-service/v1/stock/priceFairValue/v2/{share_id}/data', headers=headers).json()
    chart_values  = dict(zip(r['columnDefs'], r['table']['rows'][1]['datum']))
    return chart_values

headers = {
    'User-Agent': 'Mozilla/5.0',
    'Referer': '',
    'ApiKey': '',
    'X-API-REALTIME-E': '',
    'X-SAL-ContentType': '',
}

p = re.compile(r'apigee:"(.*?)"|realtime:"(.*?)"')
p1 = re.compile(r'contentType:"(.*?)",profile')
p2 = re.compile(r'byId:{"(.*?)"')

with requests.Session() as s:
    #set-up
    ###########################################################################
    ## This gets all nasdaq tickers and populates tickers with these.
    r = s.get('https://www.cnbc.com/nasdaq-100/')
    soup = bs(r.content, 'lxml')
    tickers = [i.text for i in soup.select('.quoteTable a')] 
    ## you could instead replace the above with a manually supplied list of desired tickers
    # tickers = ['tickerA','tickerB'] 
    ##########################################################################
    r = s.get('https://www.morningstar.com/assets/quotes/1.3.0/js/sal-components-wrapper.js')
    access_keys = p.findall(r.text)
    api_key = access_keys[0][0]
    api_realtime_key = access_keys[1][1]
    headers['ApiKey'] = api_key
    headers['X-API-REALTIME-E'] = api_realtime_key

    results = {}
    #specific
    for ticker in tickers:
        url = f'https://www.morningstar.com/stocks/xnas/{ticker}/price-fair-value'
        headers['Referer'] = url
        r = s.get(url)
        content_id = p1.findall(r.text)[0]
        share_id = p2.findall(r.text)[0] 
        headers['X-SAL-ContentType'] = content_id
        results[ticker] = get_chart_data(share_id)

Upvotes: 5

Mikku
Mikku

Reputation: 6654

Try:

Sub genOP()

Dim i As Long, fI As Long
Dim tickeR As String
Dim urlStr As String
Dim ie As New InternetExplorer

  Dim yr As Object 'Added By Mikku
  Dim j As Integer 'Added By Mikku


With INP
    fI = .Range("A" & .Rows.Count).End(xlUp).row

    For i = 2 To fI
        tickeR = Trim(.Range("A" & i))
        Debug.Print tickeR
        'https://www.morningstar.com/stocks/xnas/abmd/price-fair-value'
        urlStr = "https://www.morningstar.com/stocks/xnas/" & tickeR & "/price-fair-value"
        Debug.Print urlStr
        With ie
            .Visible = True
            .navigate urlStr

            Do While ie.Busy
                Application.Wait DateAdd("s", 1, Now)
            Loop


            Set yr = .document.getElementsByClassName("thead")               'Added By Mikku
            Set yr = .document.getElementsByClassName("ng-binding ng-scope") 'Added By Mikku

            'Debug.Print yr.Length                                            'Added By Mikku
            For j = 1 To 11                                                  'Added By Mikku
                Debug.Print yr(j).innerText & ":" & yr(j + 11).innerText     'Added By Mikku

            Next                                                             'Added By Mikku

        End With
    Next i

End With

End Sub

Demo:

enter image description here

Upvotes: 2

Related Questions