Reputation: 13
I would like to scrape Dividend Future Prices from HKEX.
Here's the URL of this site : http://www.hkex.com.hk/Market-Data/Futures-and-Options-Prices/Equity-Index/HSCEI-Dividend-Futures?sc_lang=en#&product=DHH
I wanted to scrape Prev.Day settlement price of the "Dec-19 Contract" via VBA. However, it doesn't have any class name or id, so I have no idea how to access the information.
<tr>
<td>Dec-19</td>
<td>-</td>
<td>-</td>
<td>413.78</td>
<td>
-
<br>
-
</td>
<td>-</td>
<td>
-
<br>
-
</td>
<td>-<td>
<td>17,330</td>
</tr>
How can I scrape this via VBA?
Upvotes: 1
Views: 2497
Reputation: 84465
You could also simply use a CSS selector and no loop:
html.querySelectorAll("td:nth-child(4)")(1).innerText
This method is fragile. If the style on the page changes this may break.
CSS selector:
If you observe the relevant part of the page (showing first contract year with headers for context and with chart between contract years removed):
The associated HTML for contract year 2019 is:
Prev.Day Settlement Price
is the 4th td
within this i.e. CSS selector td:nth-child(4)
.
This pattern is repeated for all contract years so you can return a nodeList of all matches to this (i.e. every td:nth-child(4)
with the .querySelectorAll
method).
Year 2019 is at index position 1; this is the second element in a 0 based indexed nodeList, so you access with .querySelectorAll("td:nth-child(4)")(1)
.
CSS query result - first few results:
Upvotes: 1
Reputation: 22440
It's really the hell of an automation to find specific item with no remarkable flag attached to it. However, I've written this script without hardcoding index to the elements. Give this a shot and get your desired values:
Sub Hkex_Data()
Dim IE As New InternetExplorer, html As HTMLDocument
Dim posts As Object
With IE
.Visible = False
.navigate "http://www.hkex.com.hk/Market-Data/Futures-and-Options-Prices/Equity-Index/HSCEI-Dividend-Futures?sc_lang=en#&product=DHH"
Do Until .readyState = READYSTATE_COMPLETE: Loop
Set html = .document
End With
Application.Wait (Now + TimeValue("0:00:05"))
For Each posts In html.getElementsByClassName("hsirowcon")
Row = Row + 1: Cells(Row, 1) = posts.NextSibling.NextSibling.FirstChild.innerText
Cells(Row, 2) = posts.NextSibling.NextSibling.LastChild.innerText
Next posts
IE.Quit
End Sub
Result:
19-Dec 17,330
Reference to add to the library:
Microsoft internet controls
Microsoft Html Object Library
Upvotes: 1
Reputation: 2256
Use getElementsByTagName. Identify your and then go through each row and each td in rows. Something like that.
Dim objTR As IHTMLElement
Dim objTD As IHTMLElement
Dim objTable As IHTMLElement
For Each objTR In objTable.getElementsByTagName("tr")
For Each objTD In objTR
'do something with objtd.innerText
Next objTD
Next objTR
or you can declare your variables as Object if you prefer late binding.
Upvotes: 1