Data extraction from HTML

Question

I am trying to pull data from html text.

I am having an issue with the extraction code.

Normally I deal with div or Li, this html seems to be a bit more complicated.
It is using Div id, ul Class and Span Class.

What do I put in for Class or Li extraction?

For Each li In HTMLdoc.getElementsByTagName("li")
    If li.getAttribute("class") = "a-link-normal" Then 
        Set link = li.getElementsByTagName("a")(0)
        .Cells(i, 1).Value = link.getAttribute("href")
        i = i + 1
    End If
Next li

I have also posted this here.

The new code from PEH seems to work.

However I am getting an error message.

Error Line In Code

Pᴇʜ · Accepted Answer

With this code If li.getAttribute("class") = "a-link-normal" Then you check if the current li has a class attribute a-link-normal like

but is is actually a link element with the class a-link-normal and not a list element. So I think it should be somehow like this:

For Each li In HTMLdoc.getElementsByTagName("li")        
    Set link = li.getElementsByTagName("a")(0)
    If link.getAttribute("class") = "a-link-normal" Then 
        .Cells(i, 1).Value = link.getAttribute("href")
        i = i + 1
    End If
Next li

You might come accross

elements that have no links inside.

For Each li In HTMLdoc.getElementsByTagName("li")
    Set link = Nothing
    On Error Resume Next        
    Set link = li.getElementsByTagName("a")(0)
    On Error Goto 0

    If Not link Is Nothing Then 
        If link.getAttribute("class") = "a-link-normal" Then 
            .Cells(i, 1).Value = link.getAttribute("href")
            i = i + 1
        End If
    End if
Next li

Data extraction from HTML

Answers (2)

Related Questions