Part_Time_Nerd
Part_Time_Nerd

Reputation: 1014

Using Excel VBA to scrape HTML

I have been trying to scrape and parse a website for some financial data so that I can add the data to an Excel spreadsheet using VBA. I have found several possible solutions, but I cannot seem to get them to fit my parameters. My problem is that I only need one variable (Average Target Price) from a table. I have not been able to figure out what I am doing wrong. I will also be using a similar VBA format to check several hundred companies at a time so if there is a more efficient way to code what I have please let me know.

Here is what I have so far:

Sub ImportAnalystEst()

Dim oHtml       As HTMLDocument
Dim oElement    As IHTMLElement

Set oHtml = New HTMLDocument

With CreateObject("WINHTTP.WinHTTPRequest.5.1")
    .Open "GET", "http://www.marketwatch.com/investing/stock/aapl/analystestimates", False
    .send
    oHtml.body.innerHTML = .responseText
End With

Dim wsTarget As Worksheet
Dim i As Integer
i = 1
Set wsTarget = ActiveWorkbook.Worksheets("Sheet1")

For Each oElement In oHtml.getElementsByClassName("snapshot")
  wsTarget.Range("A" & i) = Split(oElement.Children(0).innerText, "<TD>")
  i = i + 1
Next

End Sub

Here is the HTML I am trying to pull from. Can someone please give an example of how I could extract the average target price of 146.52?

<div class="analystEstimates">

<div class="block">
    <h2>Snapshot</h2>
</div>
<table class="snapshot">
    <tbody>
        <tr>
            <td class="first">Average Recommendation:</td>
            <td class="recommendation">
                Overweight
            </td>
            <td class="first column2">Average Target Price:</td>
            <td>146.52</td>
        </tr>
        <tr>
            <td class="first">Number of Ratings:</td>
            <td>

Upvotes: 1

Views: 3777

Answers (3)

QHarr
QHarr

Reputation: 84465

Far easier to use a CSS selector combination to target the value by it's position as the first row table cell in the second column of the table. The CSS selector is .snapshot .first.column2 + td which uses "." class selector, " " descendant combinator, and "+" adjacent sibling combinator.

Option Explicit
Public Sub ImportAnalystEst()
    Dim oHtml       As HTMLDocument
    Dim oElement    As IHTMLElement

    Set oHtml = New HTMLDocument

    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://www.marketwatch.com/investing/stock/aapl/analystestimates", False
        .send
        oHtml.body.innerHTML = .responseText
    End With
    Debug.Print oHtml.querySelector(".snapshot .first.column2 + td").innertext
End Sub

Upvotes: 1

ASH
ASH

Reputation: 20302

This will do what you want.

Sub Test() Dim IE As Object

Set IE = CreateObject("InternetExplorer.Application")
With IE
    .Visible = True
    .Navigate "http://www.marketwatch.com/investing/stock/aapl/analystestimates" ' should work for any URL
    Do Until .ReadyState = 4: DoEvents: Loop

        x = .document.body.innertext
        y = InStr(1, x, "Average Target Price:")
        Z = Mid(x, y, 6)

        Range("A1").Value = Trim(Z)

        .Quit
    End With
End Sub

Upvotes: 0

Part_Time_Nerd
Part_Time_Nerd

Reputation: 1014

I was able to solve my issue with the following:

Sub ImportAnalystEst()
Dim oHtml       As HTMLDocument
Dim oElement    As IHTMLElement

Set oHtml = New HTMLDocument


With CreateObject("WINHTTP.WinHTTPRequest.5.1")
    .Open "GET", "http://www.marketwatch.com/investing/stock/aapl/analystestimates", False
    .send
    oHtml.body.innerHTML = .responseText
End With

Dim wsTarget As Worksheet
Dim i As Integer
i = 1
Set wsTarget = ActiveWorkbook.Worksheets("Sheet1")


For Each oElement In oHtml.getElementsByClassName("snapshot")
  wsTarget.Range("A" & i) = Split(oHtml.getElementsByClassName("snapshot").Item(0).FirstChild.FirstChild.innerHTML, "TD")(7)
  wsTarget.Range("A" & i) = Replace(wsTarget.Range("A" & i), ">", "")
  wsTarget.Range("A" & i) = Replace(wsTarget.Range("A" & i), "</", "")
  i = i + 1
Next


End Sub

Upvotes: 1

Related Questions