Reputation: 91

Simulating a JavaScript button click with Scrapy

My intent is to run a scrapy crawler on this web page: http://visit.rio/en/o-que-fazer/outdoors/ . However, there's some resources on id="container" that load by a JavaScript button ("VER MAIS") click only. I've read some stuffs about selenium, but I've got nothing.

Upvotes: 2

Answers (1)

Rafael Almeida

Reputation: 5240

You read right, your best bet would be scrapy + selenium using a Firefox browser or a headless one like PhantomJS for faster scraping.

Example adapted from https://stackoverflow.com/a/17979285/2781701

import scrapy
from selenium import webdriver

class ProductSpider(scrapy.Spider):
    name = "product_spider"
    allowed_domains = ['visit.rio']
    start_urls = ['http://visit.rio/en/o-que-fazer/outdoors']

    def __init__(self):
        self.driver = webdriver.Firefox()
    def parse(self, response):
        self.driver.get(response.url)

        while True:
            next = self.driver.find_element_by_xpath('//div[@id="show_more"]/a')

            try:
                next.click()

                # get the data and write it to scrapy items
            except:
                break

        self.driver.close()

Upvotes: 10

Simulating a JavaScript button click with Scrapy

Answers (1)

Related Questions