Reputation: 775
I am slowly exploring if I can use VBA to code a macro that will search a website from a list of keywords/codes in column A and extract the data. Currently The code below searches the desired website using the range in ("A1") only but does get to the right page with the data I wish to extract. In this case the Code in a1 is 100-52-7
Sub BrowseToSite()
Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument
IE.Visible = True
IE.Navigate "https://apps.who.int/food-additives-contaminants-jecfa-database/Search.aspx"
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
IE.Document.forms("form1").Elements("ctl00$ContentPlaceHolder1$txtSearch").Value = Range("a1").Value
IE.Document.forms("form1").Elements("ctl00$ContentPlaceHolder1$btnSearch").Click
Set HTMLDoc = IE.Document
'Sheet1.Range("B1").Value = HTMLDoc.getElementsByClassName("sectionHead1")(0).innerText
End Sub
Now I wish to pull the "0-5 mg/kg bw (1996)" phrase on this page into Excel. I planned to do this by retriving the inner text within the class name however I run into an error Object Variable or With Block variable not set
with the following line:
Sheet1.Range("B1").Value = HTMLDoc.getElementsByClassName("sectionHead1")(0).innerText
Upvotes: 2
Views: 1241
Reputation: 22440
You can get rid of IE altogether and try using xmlhttp requests to make the script robust. What the following script does is send a get http requests first to scrape the value of certain parameters supposed to be used within post requests and then issue a post requests to parse the desired content.
This is one of the efficient ways how you can:
Option Explicit
Public Sub GetContent()
Const Url = "https://apps.who.int/food-additives-contaminants-jecfa-database/Search.aspx"
Dim oHttp As Object, oHtml As HTMLDocument, MyDict As Object
Dim DictKey As Variant, payload$, searchKeyword$
Set oHtml = New HTMLDocument
Set oHttp = CreateObject("MSXML2.XMLHTTP")
Set MyDict = CreateObject("Scripting.Dictionary")
'send get requests first to parse the value of "__VIEWSTATE", "__VIEWSTATEGENERATOR" e.t.c., as in oHtml.getElementById("__VIEWSTATE").Value
With oHttp
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.send
oHtml.body.innerHTML = .responseText
End With
searchKeyword = "100-52-7" 'this is the search keyword you wanna use from your predefined search terms
'MyDict stores keys and values within dictionary, as in __VIEWSTATE = "some value" and so on
MyDict("__VIEWSTATE") = oHtml.getElementById("__VIEWSTATE").Value
MyDict("__VIEWSTATEGENERATOR") = oHtml.getElementById("__VIEWSTATEGENERATOR").Value
MyDict("__EVENTVALIDATION") = oHtml.getElementById("__EVENTVALIDATION").Value
MyDict("ctl00$ContentPlaceHolder1$txtSearch") = searchKeyword
MyDict("ctl00$ContentPlaceHolder1$btnSearch") = "Search"
MyDict("ctl00$ContentPlaceHolder1$txtSearchFEMA") = ""
'joining each set of key and value with ampersand to make it a string so that you can use it as a parameter while issuing post requests, which is what payload is doing
payload = ""
For Each DictKey In MyDict
payload = IIf(Len(DictKey) = 0, WorksheetFunction.encodeURL(DictKey) & "=" & WorksheetFunction.encodeURL(MyDict(DictKey)), _
payload & "&" & WorksheetFunction.encodeURL(DictKey) & "=" & WorksheetFunction.encodeURL(MyDict(DictKey)))
Next DictKey
With oHttp
.Open "POST", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send (payload)
oHtml.body.innerHTML = .responseText
End With
MsgBox oHtml.querySelector("#SearchResultItem > a").NextSibling.NodeValue
End Sub
Make sure to add the following libraries to execute the above script:
Microsoft XML, v6.0
Microsoft Scripting Runtime
Microsoft HTML Object Library
Upvotes: 3
Reputation: 10139
You click on an element with this line of code:
IE.Document.forms("form1").Elements("ctl00$ContentPlaceHolder1$btnSearch").Click
for which IE makes a POST request to retrieve your results, as can be seen here:
The above is a screen shot from Edge's dev tools, but concept is the same
During this request, the element in question is not immediately there, so you will need to wait for it to load.
Your prior method of
Do While IE.ReadyState <> READYSTATE_COMPLETE
Loop
would probably work, but I find it to be inconsistent at times and would also include checking the .Busy
property as well.
Try using this after your click:
IE.Document.forms("form1").Elements("ctl00$ContentPlaceHolder1$btnSearch").Click
'~~WAIT FOR SEARCH RESULTS TO LOAD~~
Do While IE.ReadyState < READYSTATE_COMPLETE Or IE.Busy
Loop
Set HTMLDoc = IE.Document
Sheet1.Range("B1").Value = HTMLDoc.getElementsByClassName("sectionHead1")(0).innerText
If you're still having issues, you can force IE to wait for the element in question to become available by doing this:
On Error Resume Next
Do while HTMLDoc.getElementsByClassName("sectionHead1")(0) is Nothing
Loop
On Error Goto 0
Set HTMLDoc = IE.Document
Sheet1.Range("B1").Value = HTMLDoc.getElementsByClassName("sectionHead1")(0).innerText
This is a simple loop that checks for the object, and will continue to loop until that object is no longer Nothing
(which essentially means it has loaded).
And I would recommend that you add some sort of timeout that may trigger an error or something just in case the webpage is having issues so you're not in an infinite loop.
Pro Tip:
If you are clicking the search button a lot of times and waiting for a lot of objects to load, instead of duplicating the above code you can turn it into it's own sub and do something like:
Sub WaitForElement(IE as InternetExplorer, elem As Object) Do While IE.ReadyState < 4 Or IE.Busy: Loop On Error Resume Next Do While elem is Nothing: Loop On error Goto 0 End Sub
Then you would just need to use the following line after each click:
WaitForElement IE, HTMLDoc.getElementsByClassName("sectionHead1")(0)
Not only would this cut down on the number of lines in your code, it could greatly improve readability as well.
Upvotes: 1