Carlos Casio
Carlos Casio

Reputation: 79

WebScraping with VBA - change value of InputBox

I'm experienced with VBA but really new with webscraping. So far I managed to extract some tables from other webpages but this one is giving me a hard time. The link is http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es

Basically, I click the arrow drop down list next to "Exportar Cuadro" button. After that, I need to change both dates that appear there to a specific one I will have into a variable.

How can I get to change that input boxes on webpage? My code so far is the next one:

Option Explicit

Sub test()

Dim URL As String, URL2 As String, URL3 As String, URL4 As String
Dim IE As Object, obj As Object, colTR As Object, doc As Object, tr As Object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Dim objCollection As Object
Dim j As String, i As Integer


URL = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA51&locale=es"
URL2 = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA52&locale=es"
URL3 = "https://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=18&accion=consultarCuadroAnalitico&idCuadro=CA53&locale=es"
URL4 = "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es"
'Tipos de cambio
Set IE = CreateObject("InternetExplorer.Application")

IE.Visible = True
IE.navigate URL4

Do While IE.Busy Or IE.readyState <> 4
    DoEvents
Loop

Application.Wait (Now + TimeValue("00:00:01"))

IE.document.getElementById("exportaCuadroToggle").Click

Set objCollection = IE.document.getElementsByTagName("ID")
i = 0
While i < objCollection.Length
    If objCollection(i).Value = "26/08/2019" Then
        ' Set text for search
        objCollection(i).Value = "01/08/2019"
    End If
    If objCollection(i).Name = "form-control form-control-sm fechaFin" Then
        ' Set text for search
        objCollection(i).Value = "01/08/2019"
    End If
Wend

End Sub

Note: URL, URL2 and URL3 are used in the complete code but I ommited that part for now because those links are already doing what I want.

Upvotes: 3

Views: 919

Answers (3)

QHarr
QHarr

Reputation: 84465

Looking at the API documentation referenced by @StavrosJon it seems you can do the following. The relevant API end point is:

GET series/:idSerie/datos/:fechaI/:fechaF

You can get a free token. The details regarding usage and limitations are here.

The API call requires a comma separated list of series ids as one of its parameters. You can hardcode these or, as I do, simply grab these from the existing webpage you reference and then pass in the subsequent API call. I regex out the necessary series ids.

The response is json - as detailed here - ergo you need a json parser to handle the response. I use jsonconverter.bas. Download raw code from here and add to standard module called jsonConverter . You then need to go VBE > Tools > References > Add reference to Microsoft Scripting Runtime.

I use some helper functions to ensure that I have correctly ordered dates output and that missing information is handled appropriately.

Sort output on titulo column if you want pairings of items e.g. Max / Min. Otherwise, you could implement a custom sort.


VBA:

Option Explicit
Public Sub GetData()
    '<  VBE > Tools > References > Microsoft Scripting Runtime
    Dim json As Object, re As Object, s As String, xhr As Object
    Dim startDate As String, endDate As String, ws As Worksheet, ids As String

    startDate = "2019-08-18"
    endDate = "2019-08-24"

    Dim datesDict As Object, headers(), results(), key As Variant, r As Long

    Set datesDict = GetDateDictionary(startDate, endDate)
    ReDim headers(1 To datesDict.Count + 2)
    headers(1) = "idSerie"
    headers(2) = "titulo"
    r = 3

    For Each key In datesDict.keys
        headers(r) = key
        r = r + 1
    Next

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set re = CreateObject("VBScript.RegExp")
    Set xhr = CreateObject("MSXML2.XMLHTTP")

    With xhr
        .Open "GET", "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es", False
        .send
        s = .responseText
        ids = GetIds(re, s)
        If ids = "No match" Then Exit Sub
        .Open "GET", "https://www.banxico.org.mx/SieAPIRest/service/v1/series/" & ids & "/datos/" & startDate & "/" & endDate & "", False 'https://www.banxico.org.mx/SieAPIRest/service/v1/doc/consultaDatosSerieRango
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .setRequestHeader "Bmx-Token", "aa833b22ee2a350192df6962b1eb6d8ea569ac895862ecc31b79b46859c7e74c" 'https://www.banxico.org.mx/SieAPIRest/service/v1/token  ''<== Replace with your generated token
        .send
        s = .responseText
    End With

    Set json = JsonConverter.ParseJson(s)("bmx")("series")

    ReDim results(1 To json.Count, 1 To UBound(headers))

    WriteOutResults ws, re, startDate, endDate, json, results, headers
End Sub

Public Sub WriteOutResults(ByVal ws As Worksheet, ByVal re As Object, ByVal startDate As String, ByVal endDate As String, ByVal json As Object, ByRef results(), ByRef headers())
    Dim item As Object, subItem As Object, key As Variant
    Dim r As Long, c As Long, datesDict As Object, nextKey As Variant
    re.Pattern = "\s{2,}"
    For Each item In json
        Set datesDict = GetDateDictionary(startDate, endDate)
        r = r + 1

        For Each key In item.keys
            Select Case key
            Case "idSerie"
                results(r, 1) = item(key)
            Case "titulo"
                results(r, 2) = re.Replace(item(key), Chr$(32))
            Case "datos"
                c = 3
                For Each subItem In item(key)
                    datesDict(subItem("fecha")) = subItem("dato")
                Next subItem
                For Each nextKey In datesDict.keys
                    results(r, c) = datesDict(nextKey)
                    c = c + 1
                Next
            End Select
        Next
    Next
    With ws
        .Cells(1, 1).Resize(1, UBound(headers)) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results
    End With
End Sub

Public Function GetIds(ByVal re As Object, ByVal responseText As String) As String
    Dim matches As Object, i As Long, dict As Object
    Set dict = CreateObject("Scripting.Dictionary")
    With re
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "'(SF\d{5})'"                 'regex pattern to get json string
        If .test(responseText) Then
            Set matches = .Execute(responseText)
            For i = 0 To matches.Count - 1
                dict(matches(i).SubMatches(0)) = vbNullString
            Next
            GetIds = Join$(dict.keys, ",")
        Else
            GetIds = "No match"
        End If
    End With
End Function

Public Function GetDateDictionary(ByVal startDate As String, ByVal endDate As String) As Object
    Dim sDate As Long, eDate As Long
    Dim dateDict As Object, i As Long

    Set dateDict = CreateObject("Scripting.Dictionary")
    sDate = CDate(startDate)
    eDate = CDate(endDate)
    For i = sDate To eDate
        dateDict(Format$(i, "dd/mm/yyyy")) = vbNullString
    Next
    Set GetDateDictionary = dateDict
End Function

Example results:

enter image description here


Regex:

enter image description here

Upvotes: 2

Stavros Jon
Stavros Jon

Reputation: 1697

From what I can see, changing the dates in that drop down box, doesn't update the table displayed in the page, which means there's no point of scraping that.

Unless I'm missing something, it seems way easier to download the excel file and manipulate it with vba to get the data you need. Therefore I will not address the "how to change the dates in the inputbox" issue, because I find it futile. Instead, I will suggest an alternative approach.

If you inspect the network traffic using your browser's developer tools, you'll see that when you press the "Exportar cuadro" button, a GET request is being sent which uses as parameters the start and end dates in unix timestamp and returns the corresponding excel file. You only need the URL

Here's an example of how you could get the file:

Option Explicit

Sub Test()

Dim wb As Workbook
Dim url As String
Dim startDate As Double
Dim endDate As Double
startDate = ToUnix("10/08/2019") 'use whichever date you want
endDate = ToUnix("20/08/2019") 'use whichever date you want
url = "http://www.banxico.org.mx/SieInternet/consultarDirectorioInternetAction.do?sector=6&accion=consultarCuadro&idCuadro=CF102&locale=es&formatoXLS.x=1&fechaInicio=" & startDate & "&fechaFin=" & endDate
Set wb = Workbooks.Open(url)

End Sub

Public Function ToUnix(dt As Date) As Double 'credits to @Tim Williams
ToUnix = DateDiff("s", "1/1/1970", dt) * 1000
End Function

The code above, for demonstration purposes, will just open the report for two random dates. Once the workbook is stored in a workbook variable you can manipulate it as usual and do whatever you need with it.

You can modify the code to suit your needs.

Now, having said that, the website offers an API with extensive documentation and examples which you could use to get whatever info you need in a fast and reliable way. I would strongly recommend looking into it.

On a separate note, there's no such thing as an HTML tag named "ID", so this:

IE.document.getElementsByTagName("ID")

should return Nothing.

Upvotes: 2

ProfoundlyOblivious
ProfoundlyOblivious

Reputation: 1485

I was able to change the date by putting a breakpoint on:

Set objCollection = IE.document.getElementsByTagName("ID")

Then I used the immediate window to set a variable you already declared for me:

set elecol = ie.document.queryselector("#selecPeriodoCuadro > div > div > input.form-control.form-control-sm.fechaInicio")

And used the immediate window again to change the value of the element:

elecol.value = "20/07/2019"

You can work with the other date field using this string:

"#selecPeriodoCuadro > div > div > input.form-control.form-control-sm.fechaFin"

Upvotes: 1

Related Questions