DeeTee
DeeTee

Reputation: 23

Scraping a table from a website using VBA

I'm a newbie in VBA and the website.

I'm trying to pull data (table) from the website below to be used in a VBA code.

http://www.bkam.ma/Marches/Principaux-indicateurs/Marche-obligataire/Marche-des-bons-de-tresor/Marche-secondaire/Taux-de-reference-des-bons-du-tresor?date=13%2F02%2F2019&block=e1d6b9bbf87f86f8ba53e8518e882982#address-c3367fcefc5f524397748201aee5dab8-e1d6b9bbf87f86f8ba53e8518e882982

I tried to create an Internet Explorer browser :

Dim appIE As Object
Set appIE = CreateObject("internetexplorer.application")

With appIE
    .Navigate "http://www.bkam.ma/Marches/Principaux-indicateurs/Marche-obligataire/Marche-des-bons-de-tresor/Marche-secondaire/Taux-de-reference-des-bons-du-tresor?date=13%2F02%2F2019&block=e1d6b9bbf87f86f8ba53e8518e882982#address-c3367fcefc5f524397748201aee5dab8-e1d6b9bbf87f86f8ba53e8518e882982"
    .Visible = True
End With

Do While appIE.Busy
    DoEvents
Loop

Then, I tried to use the ID or the tagname properties to get the data

Set val = appIE.document.getElementById()

I don't know how to get the elements of the table since they don't have an ID or Tagname that I can use . As you can see in this snippet from the source code

                              </span>
                                           </div>
                                       </th>
                                                                                                                                                                                        </tr>
                            </thead>
                            <tbody>
                                                
                                                                                         
                                                                                                                                                                                    <tr>
                             
         <td>18/03/2019</td>
      
         <td><span class="number">20,05</sapn>&nbsp;<span class="symbol"></span></td>
      
         <td><span class="number">2,250</sapn>&nbsp;<span class="symbol">%</span></td>
      
         <td>13/02/2019</td>
      
    
                             </tr>
                        
                                             

this snippet shows the first row of the table that I'm trying to extract.

Upvotes: 2

Views: 8183

Answers (2)

DarthVlader
DarthVlader

Reputation: 373

First, you can find the table based on its class attribute

Set HTMLTable = appIE.document.getElementsByClassName("dynamic_contents_ref_12")(0)

This will get the array of HTML elements with the class name dynamic_contents_ref_12 and return its first element.

Then, you can 'crawl' the table using the `.Children` attribute

This will get you the first row:

Set TBody = HTMLTable.Children(1) 'The <tbody> tag is the second child
Set Row1 = TBody.Children(0)      'The first <tr> inside the <tbody> tag

For each row put a different index in the brackets.

Now the HTML inside Row1 looks like

<tr>

  <td>
    18/03/2019
  </td>

  <td>
    <span class="number">
      20,05&nbsp;
      <span class="symbol"></span>
    </span>
  </td>

  <td>
    <span class="number">
      2,250&nbsp;
      <span class="symbol">%</span>
    </span>
  </td>

  <td>
    13/02/2019
  </td>

</tr>

(Each <td> is a cell in the row.)

To get the text inside a cell we can use the .innerText method, which returns a string:

CellA1 = Row1.Children(0).innerText ' = "05/04/2019"
CellB1 = Row1.Children(1).innerText ' = "43,85 "

Putting it all together

Using a For Each loop, we can get all the cells from the HTML table and copy them to a sheet - let's assume you want to start from cell A1.

'Table Headers
ActiveSheet.Range("A1").Value = "Date d'échéance"
ActiveSheet.Range("B1").Value = "Transaction"
ActiveSheet.Range("C1").Value = "Taux moyen pondéré"
ActiveSheet.Range("D1").Value = "Date de la valeur"

Set HTMLTable = appIE.document.getElementsByClassName("dynamic_contents_ref_12")(0)
Set TBody = HTMLTable.Children(1)
RowIndex = 2
For Each Row in TBody.Children
  ActiveSheet.Cells(RowIndex, 1).Value = Row.Children(0).innerText
  ActiveSheet.Cells(RowIndex, 2).Value = Row.Children(1).innerText
  ActiveSheet.Cells(RowIndex, 3).Value = Row.Children(2).innerText
  ActiveSheet.Cells(RowIndex, 4).Value = Row.Children(3).innerText
  RowIndex = RowIndex + 1
Next

Upvotes: 1

QHarr
QHarr

Reputation: 84465

You can avoid a browser and use xmlhttp to get the page content, then select the table element by its class (there is no id to use and class is the next fastest selector after id) and then loop the rows and columns writing out to sheet.

Option Explicit
Public Sub GetTable()
    Dim html As MSHTML.HTMLDocument, hTable As Object, ws As Worksheet
    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set html = New MSHTML.HTMLDocument                  '<  VBE > Tools > References > Microsoft Scripting Runtime
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://www.bkam.ma/Marches/Principaux-indicateurs/Marche-obligataire/Marche-des-bons-de-tresor/Marche-secondaire/Taux-de-reference-des-bons-du-tresor?date=13%2F02%2F2019&block=e1d6b9bbf87f86f8ba53e8518e882982#address-c3367fcefc5f524397748201aee5dab8-e1d6b9bbf87f86f8ba53e8518e882982", False
        .send
        html.body.innerHTML = .responseText
    End With
    Set hTable = html.querySelector(".dynamic_contents_ref_12")
    Dim td As Object, tr As Object, th As Object, r As Long, c As Long
    For Each tr In hTable.getElementsByTagName("tr")
        r = r + 1: c = 1
        For Each th In tr.getElementsByTagName("th")
            ws.Cells(r, c) = th.innerText
            c = c + 1
        Next
        For Each td In tr.getElementsByTagName("td")
            ws.Cells(r, c) = td.innerText
            c = c + 1
        Next
    Next
End Sub

Upvotes: 1

Related Questions