Reputation: 45
I'm trying to extract data from Yahoo Finance into cells in Excel.
I found this code.
It returns a very long text, essentially all the 'view code' html text of the required link.
The required data is there but it is not being parsed and returned.
The line in the responseText
where the data I require is given begins with this string:
<script type="application/json" data-sveltekit-fetched data-url="https://query1.finance.yahoo.com/v10/finance/quoteSummary/AAPL?formatted=true&modules=upgradeDowngradeHistory%2CrecommendationTrend%2Cfinanci
Part of the string for the 'numberOfAnalystOptions' looks like this:
"buy","numberOfAnalystOpinions":{"raw":37,"fmt":"37","longFmt":"37"},"totalCash":{"raw":67150000128,"fmt"
How do I get at the data this contains?
Sub SharePrices()
Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
Dim sResp$, sHigh$, currentPrice$
Dim analystNum$, sLow$, tMeanprice$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
.send
sResp = .responseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = "numberOfAnalystOpinions[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
analystNum = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetMeanPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
tMeanprice = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetHighPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
sHigh = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetLowPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
sLow = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "currentPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).Count > 0 Then
currentPrice = .Execute(sResp)(0).SubMatches(0)
End If
End With
ActiveCell.Value = "Test"
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = analystNum
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = tMeanprice
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = sHigh
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = sLow
ActiveCell.Offset(0, 1).Select
ActiveCell.Value = currentPrice
End Sub
Upvotes: 1
Views: 492
Reputation: 18762
Comments on RegExp pattern
\\
in RegExp patterns.[\s\S]
matchs any char, but \\"":{\\""
is more preciesly match the reponse text.[\d.]+
to match a number is better than using .*?
.For
loop to simplify the codeSub SharePrices1()
Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
Dim sResp$, sHigh$, currentPrice$
Dim analystNum$, sLow$, tMeanprice$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
.send
sResp = .responseText
End With
Dim aKey, i As Long
Const SPAT = "\\"":{\\""raw\\"":([\d.]+),"
aKey = Split("numberOfAnalystOpinions targetMeanPrice targetHighPrice targetLowPrice currentPrice")
ActiveCell.Resize(1, UBound(aKey) + 1).Value = aKey
With CreateObject("VBScript.RegExp")
For i = 0 To UBound(aKey)
.Pattern = aKey(i) & SPAT
If .Execute(sResp).Count > 0 Then
ActiveCell.Offset(1, i).Value = .Execute(sResp)(0).SubMatches(0)
End If
Next
End With
End Sub
Upvotes: 1
Reputation: 166126
Since the data you want to extract is present in the page as a JSON string inside a script block, you can parse that into a JSON object and pick out the parts you want, instead of using regex string searching. This will be easier if you need to do any looping for example, or there's text which occurs more than once.
You will need to import JsonConverter.bas (from https://github.com/VBA-tools/VBA-JSON) into your VBA project.
Sub SharePrices()
Dim json As Object, sResp As String, html As Object, scripts, script
Dim scriptSource, fd As Object
Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.send
sResp = .responseText
End With
'create an HTML document and load the response into it
Set html = CreateObject("htmlfile")
html.Open "text/html"
html.Write sResp
html.Close
'get all the <script> blocks
Set scripts = html.getElementsByTagName("script")
'find the script block which has the information we want...
For Each script In scripts 'find the script we want...
scriptSource = script.getAttribute("data-url") & "" 'append empty string to handle Null
If scriptSource Like "*upgradeDowngradeHistory*" Then
'parse the script content as json
Set json = JsonConverter.ParseJson(script.Text)
'actual data is embedded here - parse that
Set json = JsonConverter.ParseJson(json("body"))
End If
Next script
'didn't locate the required script ?
If json Is Nothing Then Exit Sub
'This part of the JSON object has the pieces of data we want
Set fd = json("quoteSummary")("result")(1)("financialData")
'verify the values. Here is where you'd write them to the sheet
Debug.Print fd("numberOfAnalystOpinions")("raw")
Debug.Print fd("targetMeanPrice")("raw")
Debug.Print fd("targetHighPrice")("raw")
Debug.Print fd("targetLowPrice")("raw")
Debug.Print fd("currentPrice")("raw")
End Sub
Upvotes: 1
Reputation: 3285
The response includes escaped JSON text, so quotation marks are preceded by the backslash (\
) which is the escape character in JSON.
You will need to include this in your regex as a literal character, which means you will have to escape it in your regex. The regex escape character is also the backslash, so this means that \\
represents a literal backslash in a regex.
So you will need to insert \\
before ""
in each of your regular expressions, like this:
.Pattern = "numberOfAnalystOpinions[\s\S]+?raw\\"":(.*?),"
.Pattern = "targetMeanPrice[\s\S]+?raw\\"":(.*?),"
and so on.
With these literal backslashes inserted, the regexes all match successfully. (I can't guarantee the results are what you expect, of course, especially as the returned HTML might vary.)
Upvotes: 2