Reputation: 79
I'm experienced with VBA but really new with webscraping. So far I managed to extract some tables from other webpages but this one is giving me a hard time. The link is http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es
Basically, I click the arrow drop down list next to "Exportar Cuadro" button. After that, I need to change both dates that appear there to a specific one I will have into a variable.
How can I get to change that input boxes on webpage? My code so far is the next one:
Option Explicit
Sub test()
Dim URL As String, URL2 As String, URL3 As String, URL4 As String
Dim IE As Object, obj As Object, colTR As Object, doc As Object, tr As Object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Dim objCollection As Object
Dim j As String, i As Integer
URL = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA51&locale=es"
URL2 = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA52&locale=es"
URL3 = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA53&locale=es"
URL4 = "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es"
'Tipos de cambio
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.navigate URL4
Do While IE.Busy Or IE.readyState <> 4
DoEvents
Loop
Application.Wait (Now + TimeValue("00:00:01"))
IE.document.getElementById("exportaCuadroToggle").Click
Set objCollection = IE.document.getElementsByTagName("ID")
i = 0
While i < objCollection.Length
If objCollection(i).Value = "26/08/2019" Then
' Set text for search
objCollection(i).Value = "01/08/2019"
End If
If objCollection(i).Name = "form-control form-control-sm fechaFin" Then
' Set text for search
objCollection(i).Value = "01/08/2019"
End If
Wend
End Sub
Note: URL
, URL2
and URL3
are used in the complete code but I ommited that part for now because those links are already doing what I want.
Upvotes: 3
Views: 919
Reputation: 84465
Looking at the API documentation referenced by @StavrosJon it seems you can do the following. The relevant API end point is:
GET series/:idSerie/datos/:fechaI/:fechaF
You can get a free token. The details regarding usage and limitations are here.
The API call requires a comma separated list of series ids as one of its parameters. You can hardcode these or, as I do, simply grab these from the existing webpage you reference and then pass in the subsequent API call. I regex out the necessary series ids.
The response is json - as detailed here - ergo you need a json parser to handle the response. I use jsonconverter.bas. Download raw code from here and add to standard module called jsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime.
I use some helper functions to ensure that I have correctly ordered dates output and that missing information is handled appropriately.
Sort output on titulo
column if you want pairings of items e.g. Max / Min. Otherwise, you could implement a custom sort.
VBA:
Option Explicit
Public Sub GetData()
'< VBE > Tools > References > Microsoft Scripting Runtime
Dim json As Object, re As Object, s As String, xhr As Object
Dim startDate As String, endDate As String, ws As Worksheet, ids As String
startDate = "2019-08-18"
endDate = "2019-08-24"
Dim datesDict As Object, headers(), results(), key As Variant, r As Long
Set datesDict = GetDateDictionary(startDate, endDate)
ReDim headers(1 To datesDict.Count + 2)
headers(1) = "idSerie"
headers(2) = "titulo"
r = 3
For Each key In datesDict.keys
headers(r) = key
r = r + 1
Next
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set re = CreateObject("VBScript.RegExp")
Set xhr = CreateObject("MSXML2.XMLHTTP")
With xhr
.Open "GET", "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es", False
.send
s = .responseText
ids = GetIds(re, s)
If ids = "No match" Then Exit Sub
.Open "GET", "https://www.banxico.org.mx/SieAPIRest/service/v1/series/" & ids & "/datos/" & startDate & "/" & endDate & "", False 'https://www.banxico.org.mx/SieAPIRest/service/v1/doc/consultaDatosSerieRango
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "Bmx-Token", "aa833b22ee2a350192df6962b1eb6d8ea569ac895862ecc31b79b46859c7e74c" 'https://www.banxico.org.mx/SieAPIRest/service/v1/token ''<== Replace with your generated token
.send
s = .responseText
End With
Set json = JsonConverter.ParseJson(s)("bmx")("series")
ReDim results(1 To json.Count, 1 To UBound(headers))
WriteOutResults ws, re, startDate, endDate, json, results, headers
End Sub
Public Sub WriteOutResults(ByVal ws As Worksheet, ByVal re As Object, ByVal startDate As String, ByVal endDate As String, ByVal json As Object, ByRef results(), ByRef headers())
Dim item As Object, subItem As Object, key As Variant
Dim r As Long, c As Long, datesDict As Object, nextKey As Variant
re.Pattern = "\s{2,}"
For Each item In json
Set datesDict = GetDateDictionary(startDate, endDate)
r = r + 1
For Each key In item.keys
Select Case key
Case "idSerie"
results(r, 1) = item(key)
Case "titulo"
results(r, 2) = re.Replace(item(key), Chr$(32))
Case "datos"
c = 3
For Each subItem In item(key)
datesDict(subItem("fecha")) = subItem("dato")
Next subItem
For Each nextKey In datesDict.keys
results(r, c) = datesDict(nextKey)
c = c + 1
Next
End Select
Next
Next
With ws
.Cells(1, 1).Resize(1, UBound(headers)) = headers
.Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetIds(ByVal re As Object, ByVal responseText As String) As String
Dim matches As Object, i As Long, dict As Object
Set dict = CreateObject("Scripting.Dictionary")
With re
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = "'(SF\d{5})'" 'regex pattern to get json string
If .test(responseText) Then
Set matches = .Execute(responseText)
For i = 0 To matches.Count - 1
dict(matches(i).SubMatches(0)) = vbNullString
Next
GetIds = Join$(dict.keys, ",")
Else
GetIds = "No match"
End If
End With
End Function
Public Function GetDateDictionary(ByVal startDate As String, ByVal endDate As String) As Object
Dim sDate As Long, eDate As Long
Dim dateDict As Object, i As Long
Set dateDict = CreateObject("Scripting.Dictionary")
sDate = CDate(startDate)
eDate = CDate(endDate)
For i = sDate To eDate
dateDict(Format$(i, "dd/mm/yyyy")) = vbNullString
Next
Set GetDateDictionary = dateDict
End Function
Example results:
Regex:
Upvotes: 2
Reputation: 1697
From what I can see, changing the dates in that drop down box, doesn't update the table displayed in the page, which means there's no point of scraping that.
Unless I'm missing something, it seems way easier to download the excel file and manipulate it with vba to get the data you need. Therefore I will not address the "how to change the dates in the inputbox" issue, because I find it futile. Instead, I will suggest an alternative approach.
If you inspect the network traffic using your browser's developer tools, you'll see that when you press the "Exportar cuadro" button, a GET
request is being sent which uses as parameters the start and end dates in unix timestamp and returns the corresponding excel file. You only need the URL
Here's an example of how you could get the file:
Option Explicit
Sub Test()
Dim wb As Workbook
Dim url As String
Dim startDate As Double
Dim endDate As Double
startDate = ToUnix("10/08/2019") 'use whichever date you want
endDate = ToUnix("20/08/2019") 'use whichever date you want
url = "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es&formatoXLS.x=1&fechaInicio=" & startDate & "&fechaFin=" & endDate
Set wb = Workbooks.Open(url)
End Sub
Public Function ToUnix(dt As Date) As Double 'credits to @Tim Williams
ToUnix = DateDiff("s", "1/1/1970", dt) * 1000
End Function
The code above, for demonstration purposes, will just open the report for two random dates. Once the workbook is stored in a workbook variable you can manipulate it as usual and do whatever you need with it.
You can modify the code to suit your needs.
Now, having said that, the website offers an API with extensive documentation and examples which you could use to get whatever info you need in a fast and reliable way. I would strongly recommend looking into it.
On a separate note, there's no such thing as an HTML tag named "ID", so this:
IE.document.getElementsByTagName("ID")
should return Nothing
.
Upvotes: 2
Reputation: 1485
I was able to change the date by putting a breakpoint on:
Set objCollection = IE.document.getElementsByTagName("ID")
Then I used the immediate window to set a variable you already declared for me:
set elecol = ie.document.queryselector("#selecPeriodoCuadro > div > div > input.form-control.form-control-sm.fechaInicio")
And used the immediate window again to change the value of the element:
elecol.value = "20/07/2019"
You can work with the other date field using this string:
"#selecPeriodoCuadro > div > div > input.form-control.form-control-sm.fechaFin"
Upvotes: 1