Mikal Fischer
Mikal Fischer

Reputation: 21

VBA dynamic webpage scrape Excel

I have a question on how to scrape data from this web page:

http://tvc4.forexpros.com/init.php?family_prefix=tvc4&carrier=64694b96ed4909e815f1d10605ae4e83&time=1513525898&domain_ID=70&lang_ID=70&timezone_ID=31&pair_ID=171&interval=86400&refresh=4&session=session&client=1&user=200743128&width=650&height=750&init_page=instrument&m_pids=&watchlist=&site=https://au.investing.com&version=1.11.2

It appears to be held in the iframe and a bunch of javascripting is present on the screen.

When ever I try and collect the elements in the span or div or tr tags which are held under the iframe I dont seem to be able to collect the data that is within.

What I am targeting is the innertext which is held inside the elements of class="pane-legend-item-value pane-legend-line main".

Obviously the innertext will change depending where the cursor is on the screen at that particular time. So what I have tried to do is set up an IE which already has the page loaded and the cursor in the correct location, at the end of the graph (to give me the last data point) you can then move the cursor off the screen, I have then written some simple code to grab that IE window and then tried GetElements, it is at this point that I cannot obtain any data.

This is my code so far, its very rough as I've been attempting to edit as I read more options but have not had any wins :( ... Any ideas or help would be much appreciated! (screen shot is also at the bottom)

Sub InvestingCom()

    Dim IE As InternetExplorer
    Dim htmldoc As MSHTML.IHTMLDocument 'Document object
    Dim eleColth As MSHTML.IHTMLElementCollection 'Element collection for th tags
    Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
    Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
    Dim eleRow As MSHTML.IHTMLElement 'Row elements
    Dim eleCol As MSHTML.IHTMLElement 'Column elements
    Dim elehr As MSHTML.IHTMLElement 'Header Element
    Dim iframeDoc As MSHTML.HTMLDocument
    Dim frame As HTMLIFrame
    Dim ieURL As String 'URL

    'Take Control of Open IE
    marker = 0
    Set objShell = CreateObject("Shell.Application")
    IE_count = objShell.Windows.Count
    For x = 0 To (IE_count - 1)
        On Error Resume Next
        my_url = objShell.Windows(x).document.Location
        my_title = objShell.Windows(x).document.Title

        If my_title Like "*" & "*" Then 'compare to find if the desired web page is already open
            Set IE = objShell.Windows(x)
            marker = 1
            Exit For
        Else
        End If
    Next

    'Extract data
    Set htmldoc = IE.document 'Document webpage

    ' I have tried span, tr, td etc tags and various other options
    ' I have never actually tried collecting an HTMLFrame but googled it however was unsuccessful
End Sub

Screenshot of the already existing IE which excel can find and talk to with excel and VB open on the other screen and the data I would like to scrape

Screenshot of the already existing IE which excel can find and talk to with excel and VB open on the other screen and the data I would like to scrape

Upvotes: 0

Views: 1787

Answers (1)

SIM
SIM

Reputation: 22440

It was really difficult for me to deal with two nested iframes from that page to collect the desired content. But anyways, I've fixed it finally. Run the below code and get the content you have requested for:

Sub forexpros()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim frm As Object, frmano As Object, post As Object

    With IE
        .Visible = True
        .navigate "http://tvc4.forexpros.com/init.php?family_prefix=tvc4&carrier=64694b96ed4909e815f1d10605ae4e83&time=1513525898&domain_ID=70&lang_ID=70&timezone_ID=31&pair_ID=171&interval=86400&refresh=4&session=session&client=1&user=200743128&width=650&height=750&init_page=instrument&m_pids=&watchlist=&site=https://au.investing.com&version=1.11.2"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Application.Wait (Now + TimeValue("0:00:05"))
        Set frm = .document.getElementsByClassName("abs") ''this is the first iframe
        .navigate frm(0).src
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Application.Wait (Now + TimeValue("0:00:05"))
        Set html = .document
    End With

    Set frmano = html.getElementsByTagName("iframe")(0).contentWindow.document  ''this is the second iframe

    For Each post In frmano.getElementsByClassName("pane-legend-item-value pane-legend-line main")
        Debug.Print post.innerText
    Next post
    IE.Quit
End Sub

Upvotes: 2

Related Questions