Reputation: 81
i'm performing a web-scraping on the ecb's website for annual report in order to practice more. After i find all pdf's href of the page, i get loads of string like this:
https://www.ecb.europa.eu/pub/pdf/annrep/ar2016en.pdf?cb49eb74de9ddf1f55ebe03fb610d05b
https://www.ecb.europa.eu/pub/pdf/annrep/ar2015en.pdf?2e7998c5daf6a2a7e4bfccb41e81b504
https://www.ecb.europa.eu/pub/pdf/annrep/ar2014en.pdf?20def41d1b09b84d5889c707f92c9e4a
https://www.ecb.europa.eu/pub/pdf/annrep/ar2013en.pdf?fad3a17bf210c3c411c6e3c3121eb8a1
https://www.ecb.europa.eu/pub/pdf/annrep/ar2012en.pdf?40f7b4588f9adb8cf61ce44014c1b088
And so on.
Now i would like to perform an action that if the string that the user submit is CONTAINED in one of those href, it clicks on the href. (for example i insert 2015 and it clicks on the second href)
I tried with Instr but it works only if i insert the full href.
My code is this:
Sub prova()
Dim Ie As New SHDocVw.InternetExplorer
Dim Iedoc As MSHTML.HTMLDocument
Dim element As Object
Dim elements As MSHTML.IHTMLElementCollection
Dim parameter As String
parameter = "2015" 'i will insert application.inputbox
With Ie:
.navigate "https://www.ecb.europa.eu/pub/annual/html/index.en.html"
.Visible = True
End With
While Ie.readyState <> READYSTATE_COMPLETE Or Ie.Busy: DoEvents: Wend
Set Iedoc = Ie.document
Set elements = Iedoc.getElementsByClassName("pdf")
For Each element In elements:
If InStr(1, parameter, element) Then
element.Click
End If
Debug.Print element
Next element
Upvotes: 1
Views: 190
Reputation: 84465
Instr
expects a string, not an object, as the param to search in.
InStr([ start ], string1, string2, [ compare ])
The ordering is also:
string1 Required. String expression being searched.
string2 Required. String expression sought
Dependant on which string you are searching for, and its location, you might choose InStrRev to search from the end of the source string for a faster match. Note the arguments are then:
InstrRev(stringcheck, stringmatch, [ start, [ compare ]])
Technically, I think it is a param in the signature but an argument when value passed. Though someone can correct me if wrong.
You should use the href
InStr(1, href, param) >0
at a push you could use the outerHTML
but you have a larger search space so less efficient.
It is yet more efficient to simply use the DOM parser to filter the results using a css attribute = value selector with contains * , starts with ^, or ends with $ operator:
contains
operator:
Iedoc.querySelector("[href*='" & parameter & "'").click
It would be safer to test for a longer substring in the href
attribute so something like:
param = 2015
Iedoc.querySelector(".doc-title [href*='/pub/annual/html/ar" & param & "']").click
then you get rid of entire loop.
Side-notes:
In your current loop you would also likely want an Exit For
after match found.
Debug.Print element
will, if match found, simply print [Object]
.
You would want to access a property of the element itself e.g. .innerText
. However, given you just clicked on it, you risk a stale element exception bubbling up (or some other error) if element is now no longer attached to the DOM.
Upvotes: 2