Reputation: 23
I am trying to pick up "key people" field from a Wikipedia page: https://en.wikipedia.org/wiki/Abbott_Laboratories and to copy that value in my Excel spread sheet.
I managed to do it using xml http which is a method I like for its speed, you can see the code below that is working.
The code is however not flexible enough as the structure of the wiki page can change, for example it doesn't work on this page: https://en.wikipedia.org/wiki/3M
as the tr td structure is not exactly the same (key people is no longer 8th TR for the 3M page)
How can I improve my code?
Public Sub parsehtml()
Dim http As Object, html As New HTMLDocument, topics As Object, titleElem As Object, detailsElem As Object, topic As HTMLHtmlElement
Dim i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://en.wikipedia.org/wiki/Abbott_Laboratories", False
http.send
html.body.innerHTML = http.responseText
Set topic = html.getElementsByTagName("tr")(8)
Set titleElem = topic.getElementsByTagName("td")(0)
ThisWorkbook.Sheets(1).Cells(1, 1).Value = titleElem.innerText
End Sub
Upvotes: 2
Views: 954
Reputation: 2777
If row of the table is not fixed for "Key people", then why don't loop the table for "Key people"
I tested with followings modification, it is found working correctly.
In declaration section
Dim topics As HTMLTable, Rw As HTMLTableRow
and then finally
html.body.innerHTML = http.responseText
Set topic = html.getElementsByClassName("infobox vcard")(0)
For Each Rw In topic.Rows
If Rw.Cells(0).innerText = "Key people" Then
ThisWorkbook.Sheets(1).Cells(1, 1).Value = Rw.Cells(1).innerText
Exit For
End If
Next
Upvotes: 2
Reputation: 84465
There is a better faster way. At least for given urls. Match on class name of element and index into returned nodeList. Less returned items to deal with, the path to the element is shorter, and matching with class name is faster than matching on element type.
Option Explicit
Public Sub GetKeyPeople()
Dim html As HTMLDocument, body As String, urls(), i As Long, keyPeople
Set html = New HTMLDocument
urls = Array("https://en.wikipedia.org/wiki/Abbott_Laboratories", "https://en.wikipedia.org/wiki/3M")
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(urls) To UBound(urls)
.Open "GET", urls(i), False
.send
html.body.innerHTML = .responseText
keyPeople = html.querySelectorAll(".agent").item(1).innerText
ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1).Value = keyPeople
Next
End With
End Sub
Upvotes: 1