TheTerribleProgrammer
TheTerribleProgrammer

Reputation: 45

Extract data from Yahoo Finance into cells in an Excel spreadsheet using VBA

I'm trying to extract data from Yahoo Finance into cells in Excel.

I found this code.

It returns a very long text, essentially all the 'view code' html text of the required link.
The required data is there but it is not being parsed and returned.
The line in the responseText where the data I require is given begins with this string:

<script type="application/json" data-sveltekit-fetched data-url="https://query1.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&amp;modules=upgradeDowngradeHistory%2CrecommendationTrend%2Cfinanci

Part of the string for the 'numberOfAnalystOptions' looks like this:

"buy","numberOfAnalystOpinions":{"raw":37,"fmt":"37","longFmt":"37"},"totalCash":{"raw":67150000128,"fmt"

How do I get at the data this contains?

Sub SharePrices()
    
    Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
    Dim sResp$, sHigh$, currentPrice$
    Dim analystNum$, sLow$, tMeanprice$

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .send
        sResp = .responseText
    End With

    With CreateObject("VBScript.RegExp")
        .Pattern = "numberOfAnalystOpinions[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            analystNum = .Execute(sResp)(0).SubMatches(0)
        End If

        .Pattern = "targetMeanPrice[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            tMeanprice = .Execute(sResp)(0).SubMatches(0)
        End If

        .Pattern = "targetHighPrice[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            sHigh = .Execute(sResp)(0).SubMatches(0)
        End If

        .Pattern = "targetLowPrice[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            sLow = .Execute(sResp)(0).SubMatches(0)
        End If

        .Pattern = "currentPrice[\s\S]+?raw"":(.*?),"
        If .Execute(sResp).Count > 0 Then
            currentPrice = .Execute(sResp)(0).SubMatches(0)
        End If
    End With

    ActiveCell.Value = "Test"
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = analystNum
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = tMeanprice
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = sHigh
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = sLow
    ActiveCell.Offset(0, 1).Select
    ActiveCell.Value = currentPrice
    
End Sub

Upvotes: 1

Views: 492

Answers (3)

taller
taller

Reputation: 18762

Comments on RegExp pattern

  • The backslash should be escaped as \\ in RegExp patterns.
  • [\s\S] matchs any char, but \\"":{\\"" is more preciesly match the reponse text.
  • Using [\d.]+ to match a number is better than using .*?.

  • Using For loop to simplify the code
Sub SharePrices1()
    Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
    Dim sResp$, sHigh$, currentPrice$
    Dim analystNum$, sLow$, tMeanprice$
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Url, False
        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
        .send
        sResp = .responseText
    End With
    Dim aKey, i As Long
    Const SPAT = "\\"":{\\""raw\\"":([\d.]+),"
    aKey = Split("numberOfAnalystOpinions targetMeanPrice targetHighPrice targetLowPrice currentPrice")
    ActiveCell.Resize(1, UBound(aKey) + 1).Value = aKey
    With CreateObject("VBScript.RegExp")
        For i = 0 To UBound(aKey)
            .Pattern = aKey(i) & SPAT 
            If .Execute(sResp).Count > 0 Then
                ActiveCell.Offset(1, i).Value = .Execute(sResp)(0).SubMatches(0)
            End If
        Next
    End With
End Sub

Upvotes: 1

Tim Williams
Tim Williams

Reputation: 166126

Since the data you want to extract is present in the page as a JSON string inside a script block, you can parse that into a JSON object and pick out the parts you want, instead of using regex string searching. This will be easier if you need to do any looping for example, or there's text which occurs more than once.

You will need to import JsonConverter.bas (from https://github.com/VBA-tools/VBA-JSON) into your VBA project.

Sub SharePrices()
    
    Dim json As Object, sResp As String, html As Object, scripts, script
    Dim scriptSource, fd As Object
    Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
    
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", Url, False
        .send
        sResp = .responseText
    End With
    
    'create an HTML document and load the response into it
    Set html = CreateObject("htmlfile")
    html.Open "text/html"
    html.Write sResp
    html.Close
    
    'get all the <script> blocks
    Set scripts = html.getElementsByTagName("script")
 
    'find the script block which has the information we want...
    For Each script In scripts 'find the script we want...
        scriptSource = script.getAttribute("data-url") & "" 'append empty string to handle Null
        If scriptSource Like "*upgradeDowngradeHistory*" Then
            'parse the script content as json
            Set json = JsonConverter.ParseJson(script.Text)
            'actual data is embedded here - parse that
            Set json = JsonConverter.ParseJson(json("body")) 
        End If
    Next script
    
    'didn't locate the required script ?
    If json Is Nothing Then Exit Sub 
    
    'This part of the JSON object has the pieces of data we want
    Set fd = json("quoteSummary")("result")(1)("financialData")

    'verify the values. Here is where you'd write them to the sheet
    Debug.Print fd("numberOfAnalystOpinions")("raw")
    Debug.Print fd("targetMeanPrice")("raw")
    Debug.Print fd("targetHighPrice")("raw")
    Debug.Print fd("targetLowPrice")("raw")
    Debug.Print fd("currentPrice")("raw")
End Sub

Upvotes: 1

Neil T
Neil T

Reputation: 3285

The response includes escaped JSON text, so quotation marks are preceded by the backslash (\) which is the escape character in JSON.

You will need to include this in your regex as a literal character, which means you will have to escape it in your regex. The regex escape character is also the backslash, so this means that \\ represents a literal backslash in a regex.

So you will need to insert \\ before "" in each of your regular expressions, like this:

.Pattern = "numberOfAnalystOpinions[\s\S]+?raw\\"":(.*?),"

.Pattern = "targetMeanPrice[\s\S]+?raw\\"":(.*?),"

and so on.

With these literal backslashes inserted, the regexes all match successfully. (I can't guarantee the results are what you expect, of course, especially as the returned HTML might vary.)

Upvotes: 2

Related Questions