Sharid
Sharid

Reputation: 161

Data extraction from HTML

I am trying to pull data from html text.

I am having an issue with the extraction code.

Normally I deal with div or Li, this html seems to be a bit more complicated.
It is using Div id, ul Class and Span Class.

What do I put in for Class or Li extraction?

HTML Image

For Each li In HTMLdoc.getElementsByTagName("li")
    If li.getAttribute("class") = "a-link-normal" Then 
        Set link = li.getElementsByTagName("a")(0)
        .Cells(i, 1).Value = link.getAttribute("href")
        i = i + 1
    End If
Next li 

I have also posted this here.

The new code from PEH seems to work.

URL IMAGE

However I am getting an error message.

Error Message 1

Error Line In Code

Error in code

Upvotes: 1

Views: 117

Answers (2)

QHarr
QHarr

Reputation: 84465

It is simpler and faster to just use the class direct. The css class selector "." shown below is combined with href attribute selector [href] so you only retrieve elements that match that class and have an href attribute

Dim items As Object

Set items = HTMLdoc.querySelectorAll(".a-link-normal[href]")

For i = 0 To items.Length - 1
   
    .Cells(i + 1, 1).Value = items.item(i).href

End If

Upvotes: 1

Pᴇʜ
Pᴇʜ

Reputation: 57683

With this code If li.getAttribute("class") = "a-link-normal" Then you check if the current li has a class attribute a-link-normal like <li class="a-link-normal"> but is is actually a link element with the class a-link-normal and not a list element. So I think it should be somehow like this:

For Each li In HTMLdoc.getElementsByTagName("li")        
    Set link = li.getElementsByTagName("a")(0)
    If link.getAttribute("class") = "a-link-normal" Then 
        .Cells(i, 1).Value = link.getAttribute("href")
        i = i + 1
    End If
Next li 

You might come accross <li> elements that have no links <a> inside.

For Each li In HTMLdoc.getElementsByTagName("li")
    Set link = Nothing
    On Error Resume Next        
    Set link = li.getElementsByTagName("a")(0)
    On Error Goto 0

    If Not link Is Nothing Then 
        If link.getAttribute("class") = "a-link-normal" Then 
            .Cells(i, 1).Value = link.getAttribute("href")
            i = i + 1
        End If
    End if
Next li 

Upvotes: 1

Related Questions