zowie
zowie

Reputation: 1

How to extract data from websites source code?

I'm trying to copy data from a website's source code and paste it into Excel.

The data looks like this:

`<p>
    <span class="highlight">Mitarbeiter:</span> 120 in Deutschland<br/>
    <span style="display: "><span class="highlight">Umsatzklasse:</span> 10 - 50 Mio. Euro<br/></span>
    <span style="display: none"><span class="highlight">Filialen:</span> <br/></span>
    <span style="display: inline"><span class="highlight">Gegründet:</span> 1925</span>
</p`

The value I'm looking for would be "10 - 50 Mio. Euro".

This is the code I've written so far:

Sub Sample()    Dim IE As Object
    Set IE = CreateObject("InternetExplorer.Application")
    With IE
        .Visible = False
        .Navigate "https://www.wer-zu-wem.de/firma/steinel-normalien.html"
                 Do While .Busy And .readyState <> 4: DoEvents: Loop
         Application.Wait Now + TimeValue("0:00:06")
       Sheets("Dummy").Range("A1").Value = .document.body.outerHTML
        .Quit
    End With
End Sub

My approach was to first copy the entire HTML source code to a spreadsheet and then continue with the InStr function.

This just gives me an excerpt from the full source code, which does not contain the data I'm looking for.

Is there a better approach, maybe to get the data I'm looking for directly from the website without the step in between?

Edit: Tried to implement Nathan_Sav's input. Ended up with the following

Public Sub IE_Automation()

    Dim IE As InternetExplorer
    Dim HTMLdoc As HTMLDocument

    Dim d As MSHTML.HTMLDocument
    Dim y As MSHTML.IHTMLElementCollection
    Dim x As MSHTML.IHTMLSpanElement

    Set IE = New InternetExplorer

    With IE
        .Visible = False
        .Navigate "https://www.wer-zu-wem.de/firma/steinel-normalien.html"
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

        Set d = .document
        Set y = d.getElementsByTagName("Span")
        For Each x In y
            If (x = ) Then
            Else
            End If
        Next x

    End With

End Sub

I realize I am looping through the objects received from the website. What exactly do I compare the objects to though to get the data I need?

Upvotes: 0

Views: 653

Answers (1)

Nathan_Sav
Nathan_Sav

Reputation: 8531

Look at using the HTML Object Library then you can use something like the below:

Dim d As MSHTML.HTMLDocument
Dim y As MSHTML.IHTMLElementCollection
Dim x As MSHTML.IHTMLSpanElement

Set y = d.getElementsByTagName("Span")

For Each x In y


Next x

Something like this:

With IE
    .Visible = True
    .navigate "https://www.wer-zu-wem.de/firma/steinel-normalien.html"
    While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend

    Set d = .document
    Set y = d.getElementsByClassName("highlight")
    For Each x In y
        If x.className = "highlight" Then   
            If x.innerHTML = "Umsatzklasse:" Then
                Debug.Print x.ID, x.innerHTML, x.innerText, x.NextSibling.NodeValue          
            End If
        End If
    Next x

End With

Upvotes: 1

Related Questions