will199
will199

Reputation: 23

Using XMLHTTP object to parse some websites in VBA

I am trying to pick up "key people" field from a Wikipedia page: https://en.wikipedia.org/wiki/Abbott_Laboratories and to copy that value in my Excel spread sheet.

I managed to do it using xml http which is a method I like for its speed, you can see the code below that is working.

The code is however not flexible enough as the structure of the wiki page can change, for example it doesn't work on this page: https://en.wikipedia.org/wiki/3M

as the tr td structure is not exactly the same (key people is no longer 8th TR for the 3M page)

How can I improve my code?

Public Sub parsehtml()

Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
Dim i As Integer

Set http = CreateObject("MSXML2.XMLHTTP")



http.Open "GET", "https://en.wikipedia.org/wiki/Abbott_Laboratories", False

http.send

html.body.innerHTML = http.responseText

Set topic = html.getElementsByTagName("tr")(8)

Set titleElem = topic.getElementsByTagName("td")(0)

ThisWorkbook.Sheets(1).Cells(1, 1).Value = titleElem.innerText

End Sub

Upvotes: 2

Views: 954

Answers (2)

Ahmed AU
Ahmed AU

Reputation: 2777

If row of the table is not fixed for "Key people", then why don't loop the table for "Key people"

I tested with followings modification, it is found working correctly.

In declaration section

Dim topics As HTMLTable, Rw As HTMLTableRow

and then finally

html.body.innerHTML = http.responseText
Set topic = html.getElementsByClassName("infobox vcard")(0)

    For Each Rw In topic.Rows
        If Rw.Cells(0).innerText = "Key people" Then
        ThisWorkbook.Sheets(1).Cells(1, 1).Value = Rw.Cells(1).innerText
        Exit For
        End If
    Next

Upvotes: 2

QHarr
QHarr

Reputation: 84465

There is a better faster way. At least for given urls. Match on class name of element and index into returned nodeList. Less returned items to deal with, the path to the element is shorter, and matching with class name is faster than matching on element type.

Option Explicit
Public Sub GetKeyPeople()
    Dim html As HTMLDocument, body As String, urls(), i As Long, keyPeople
    Set html = New HTMLDocument
    urls = Array("https://en.wikipedia.org/wiki/Abbott_Laboratories", "https://en.wikipedia.org/wiki/3M")
    With CreateObject("MSXML2.XMLHTTP")
        For i = LBound(urls) To UBound(urls)
            .Open "GET", urls(i), False
            .send
            html.body.innerHTML = .responseText
            keyPeople = html.querySelectorAll(".agent").item(1).innerText
            ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1).Value = keyPeople
        Next
    End With
End Sub

Upvotes: 1

Related Questions