Reputation: 1548
I'm studying selenium and I want to extract the texts and links from Sympla's events, but when I click on the "more events" button, I can't extract the next events, it is always extracting the same initial events from the page.
Complete class for easy reproduction.
public static void main(String[] args) throws InterruptedException {
WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();
driver.manage().window().maximize();
driver.get("https://www.sympla.com.br/eventos?ts=online_mais-de-3-mil-eventos-online");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
// If have captcha, close the page and exit.
boolean captcha = driver.getPageSource().contains("Não sou um robô");
if (captcha == true) {
System.out.println("O Captcha apareceu, acabou a brincadeira!");
driver.close();
driver.quit();
}
// load more button
WebElement CarregarMais = driver.findElement(By
.xpath("//button[@id='more-events']"));
// Number of events counter
List<WebElement> eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
System.out.println("Number of links: " + eventos.size());
// Number of links counter
List<WebElement> eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
// iterating over the button more events
for (int j = 0; j < eventos.size(); j++) {
CarregarMais.click();
@SuppressWarnings("deprecation")
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement element = wait.until(ExpectedConditions
.elementToBeClickable(By
.xpath("//button[@id='more-events']")));
// Iterating over event links
for (int i = 0; i < eventos_link.size(); i++) {
System.out.println(i + " " + eventos.get(i).getText() + " - "
+ eventos_link.get(i).getAttribute("href"));
Thread.sleep(500);
}
}
}
Upvotes: 0
Views: 41
Reputation: 528
It's because you don't read the links again. With every click on the button a new page is created, so you need to read them again.
Furthermore you would need to store the last fetched link.
So after waiting for the button to be clickable again you need to reread eventos
and eventos_link
. And maybe you use a global variable like lastFetchedLinkIndex
.
This would be my approach (adjusted your code):
WebDriverManager.firefoxdriver().setup();
WebDriver driver = new FirefoxDriver();
driver.manage().window().maximize();
driver.get("https://www.sympla.com.br/eventos?ts=online_mais-de-3-mil-eventos-online");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
// If have captcha, close the page and exit.
boolean captcha = driver.getPageSource().contains("Não sou um robô");
if (captcha == true) {
System.out.println("O Captcha apareceu, acabou a brincadeira!");
driver.close();
driver.quit();
}
// load more button
WebElement CarregarMais = driver.findElement(By
.xpath("//button[@id='more-events']"));
// Number of events counter
List<WebElement> eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
System.out.println("Number of links: " + eventos.size());
// Number of links counter
List<WebElement> eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
int lastEventScraped = 0;
// iterating over the button more events
for (int j = 0; j < eventos.size(); j++) {
CarregarMais.click();
@SuppressWarnings("deprecation")
WebDriverWait wait = new WebDriverWait(driver, 10);
WebElement element = wait.until(ExpectedConditions
.elementToBeClickable(By
.xpath("//button[@id='more-events']")));
eventos = (List<WebElement>) driver.findElements(By
.cssSelector("div.event-name.event-card"));
eventos_link = (List<WebElement>) driver
.findElements(By.cssSelector("a.sympla-card.w-inline-block"));
// Iterating over event links
for (int i = lastEventScraped; i < eventos_link.size(); i++, lastEventScraped++) {
System.out.println(i + " " + eventos.get(i).getText() + " - "
+ eventos_link.get(i).getAttribute("href"));
Thread.sleep(500);
}
}
Upvotes: 1