Reputation: 22440
Can't figure out how can I get all the company links from the page used in my code. Running my script I get only 20 links. The page has got lazyloading method that is why I can't get all of them. Any input on this will be highly appreciated. I've tried so far with:
Sub Company_links()
Const lnk = "http://fortune.com"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim topic As Object
With http
.Open "GET", "http://fortune.com/fortune500/list/", False
.send
html.body.innerHTML = .responseText
End With
For Each topic In html.getElementsByClassName("small-12 column row")
x = x + 1
With topic.getElementsByTagName("a")
If .Length Then Cells(x, 1) = lnk & Split(.item(0).href, "about:")(1)
End With
Next topic
Set html = Nothing: Set topics = Nothing
End Sub
Upvotes: 0
Views: 113
Reputation: 20302
I would do it like this.
Option Explicit
Sub Sample()
Dim ie As Object
Dim links As Variant, lnk As Variant
Dim rowcount As Long
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
ie.navigate "http://fortune.com"
'Wait for site to fully load
'ie.Navigate2 URL
Do While ie.Busy = True
DoEvents
Loop
Set links = ie.document.getElementsByTagName("a")
rowcount = 1
With Sheets("Sheet1")
For Each lnk In links
'Debug.Print lnk.innerText
'If lnk.classname Like "*Real Statistics Examples Part 1*" Then
.Range("A" & rowcount) = lnk.innerText
rowcount = rowcount + 1
'Exit For
'End If
Next
End With
End Sub
Upvotes: 0
Reputation: 2556
Run the following code in a new workbook. It will output to results to Sheet1 regardless of they are empty or not, so be careful if you have data there. You can change this part of code later as you like.
First of all you need to activate Microsoft HTML Object Library
and Microsoft Internet Controls
from Tools -> References
in VBA Editor. Then run the following code, sit back and relax until you see "All Done!" message:
Sub Company_links()
Dim i As Long
Dim aIE As InternetExplorer
Dim Rank As IHTMLElement, Company As IHTMLElement, Revenues As IHTMLElement
Set aIE = New InternetExplorer
With aIE
.navigate "http://fortune.com/fortune500/list/"
.Visible = True
End With
Do While (aIE.Busy Or aIE.ReadyState <> READYSTATE_COMPLETE)
DoEvents
Loop
For i = 1 To 50
On Error Resume Next
Set Rank = aIE.document.getElementsByClassName("column small-2 company-rank")(999)
If Rank Is Nothing Then
GoTo Skip
End If
Exit For
Skip:
SendKeys "{end}"
Application.Wait (Now() + TimeValue("00:00:005"))
Next i
With Sheet1
.Range("A1") = "RANK"
.Range("B1") = "COMPANY"
.Range("C1") = "REVENUE"
For i = 0 To 999
Set Rank = aIE.document.getElementsByClassName("column small-2 company-rank")(i)
Set Company = aIE.document.getElementsByClassName("column small-5 company-title")(i)
Set Revenues = aIE.document.getElementsByClassName("column small-5 company-revenue")(i)
.Range("A" & i + 2) = Rank.innerText
.Range("B" & i + 2) = Company.innerText
.Range("C" & i + 2) = Revenues.innerText
Next i
End With
SendKeys "%{F4}"
Set aIE = Nothing
Set Rank = Nothing
Set Company = Nothing
Set Revenues= Nothing
MsgBox "All Done!"
End Sub
Upvotes: 1
Reputation: 1025
If the site is using ajax to load the remaining links. You would need to make the page load those remaining links first. My suggestion would be to use selenium to load the page and then use your code to get the links.
http://selenium-python.readthedocs.io/
Upvotes: 0