Lara M.
Lara M.

Reputation: 855

Click on java element scrapy+selenium

I'm trying to scrape this page: http://www.newyorkerfiction.com/Pieces.aspx with scrapy and selenium. I need to click on the different pages but I cannot find a way. My script is:

def __init__(self):
    self.driver = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs')
    self.driver.set_window_size(1920, 1080); #Size

def parse(self, response):
    self.driver.get(response.url)
    element = self.driver.find_element(By.XPATH, '//div[@class="rgWrap rgNumPart"]//a[contains(@href, "javascript:__doPostBack")]')
    self.driver.execute_script("arguments[0].click();", element)
    self.driver.save_screenshot('screenshot.png')
    for sel in response.xpath('//body'):
        item = NyfictionItem()
        item["title"] = sel.xpath('//td[@class="title"]').extract()
        yield item
    self.driver.close()

I don't understand what's wrong since I understood that execute_script makes selenium to interact with elements in javascript. I tested the xpath and it seems right.

Any ideas?

Thanks

Upvotes: 2

Views: 277

Answers (1)

alecxe
alecxe

Reputation: 474231

One problem is that your locator points to all the links in the pagination bar and, since you are getting the first one, you are actually trying to click the "1" link, but, instead, you meant to click the "next page" link, which can be located with input.rgPageNext CSS selector.

You, though, need to wait for it to be visible and clickable to make the process more reliable:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs')
driver.set_window_size(1920, 1080)

driver.get("http://www.newyorkerfiction.com/Pieces.aspx")

wait = WebDriverWait(driver, 10)
next_link = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.rgPageNext")))
next_link.click()

driver.save_screenshot('screenshot.png')

driver.close()

Note that you might though need an another wait after clicking the "next page" link to let the new page results be loaded.

And, you would also need some additional logic to stop at the last page.

Upvotes: 3

Related Questions