Reputation: 161
I am trying to pull data from html text.
I am having an issue with the extraction code.
Normally I deal with div
or Li
, this html seems to be a bit more complicated.
It is using Div id, ul Class and Span Class.
What do I put in for Class
or Li
extraction?
For Each li In HTMLdoc.getElementsByTagName("li")
If li.getAttribute("class") = "a-link-normal" Then
Set link = li.getElementsByTagName("a")(0)
.Cells(i, 1).Value = link.getAttribute("href")
i = i + 1
End If
Next li
I have also posted this here.
The new code from PEH seems to work.
However I am getting an error message.
Error Line In Code
Upvotes: 1
Views: 117
Reputation: 84465
It is simpler and faster to just use the class direct. The css class selector "." shown below is combined with href attribute selector [href] so you only retrieve elements that match that class and have an href attribute
Dim items As Object
Set items = HTMLdoc.querySelectorAll(".a-link-normal[href]")
For i = 0 To items.Length - 1
.Cells(i + 1, 1).Value = items.item(i).href
End If
Upvotes: 1
Reputation: 57683
With this code If li.getAttribute("class") = "a-link-normal" Then
you check if the current li
has a class attribute a-link-normal
like <li class="a-link-normal">
but is is actually a link element with the class a-link-normal
and not a list element. So I think it should be somehow like this:
For Each li In HTMLdoc.getElementsByTagName("li")
Set link = li.getElementsByTagName("a")(0)
If link.getAttribute("class") = "a-link-normal" Then
.Cells(i, 1).Value = link.getAttribute("href")
i = i + 1
End If
Next li
You might come accross <li>
elements that have no links <a>
inside.
For Each li In HTMLdoc.getElementsByTagName("li")
Set link = Nothing
On Error Resume Next
Set link = li.getElementsByTagName("a")(0)
On Error Goto 0
If Not link Is Nothing Then
If link.getAttribute("class") = "a-link-normal" Then
.Cells(i, 1).Value = link.getAttribute("href")
i = i + 1
End If
End if
Next li
Upvotes: 1