Reputation: 1
I'm trying to copy data from a website's source code and paste it into Excel.
The data looks like this:
`<p>
<span class="highlight">Mitarbeiter:</span> 120 in Deutschland<br/>
<span style="display: "><span class="highlight">Umsatzklasse:</span> 10 - 50 Mio. Euro<br/></span>
<span style="display: none"><span class="highlight">Filialen:</span> <br/></span>
<span style="display: inline"><span class="highlight">Gegründet:</span> 1925</span>
</p`
The value I'm looking for would be "10 - 50 Mio. Euro".
This is the code I've written so far:
Sub Sample() Dim IE As Object
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = False
.Navigate "https://www.wer-zu-wem.de/firma/steinel-normalien.html"
Do While .Busy And .readyState <> 4: DoEvents: Loop
Application.Wait Now + TimeValue("0:00:06")
Sheets("Dummy").Range("A1").Value = .document.body.outerHTML
.Quit
End With
End Sub
My approach was to first copy the entire HTML source code to a spreadsheet and then continue with the InStr function.
This just gives me an excerpt from the full source code, which does not contain the data I'm looking for.
Is there a better approach, maybe to get the data I'm looking for directly from the website without the step in between?
Edit: Tried to implement Nathan_Sav's input. Ended up with the following
Public Sub IE_Automation()
Dim IE As InternetExplorer
Dim HTMLdoc As HTMLDocument
Dim d As MSHTML.HTMLDocument
Dim y As MSHTML.IHTMLElementCollection
Dim x As MSHTML.IHTMLSpanElement
Set IE = New InternetExplorer
With IE
.Visible = False
.Navigate "https://www.wer-zu-wem.de/firma/steinel-normalien.html"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set d = .document
Set y = d.getElementsByTagName("Span")
For Each x In y
If (x = ) Then
Else
End If
Next x
End With
End Sub
I realize I am looping through the objects received from the website. What exactly do I compare the objects to though to get the data I need?
Upvotes: 0
Views: 653
Reputation: 8531
Look at using the HTML Object Library
then you can use something like the below:
Dim d As MSHTML.HTMLDocument
Dim y As MSHTML.IHTMLElementCollection
Dim x As MSHTML.IHTMLSpanElement
Set y = d.getElementsByTagName("Span")
For Each x In y
Next x
Something like this:
With IE
.Visible = True
.navigate "https://www.wer-zu-wem.de/firma/steinel-normalien.html"
While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
Set d = .document
Set y = d.getElementsByClassName("highlight")
For Each x In y
If x.className = "highlight" Then
If x.innerHTML = "Umsatzklasse:" Then
Debug.Print x.ID, x.innerHTML, x.innerText, x.NextSibling.NodeValue
End If
End If
Next x
End With
Upvotes: 1