Reputation: 91
My intent is to run a scrapy crawler on this web page: http://visit.rio/en/o-que-fazer/outdoors/ . However, there's some resources on id="container" that load by a JavaScript button ("VER MAIS") click only. I've read some stuffs about selenium, but I've got nothing.
Upvotes: 2
Views: 9163
Reputation: 5240
You read right, your best bet would be scrapy + selenium using a Firefox browser or a headless one like PhantomJS for faster scraping.
Example adapted from https://stackoverflow.com/a/17979285/2781701
import scrapy
from selenium import webdriver
class ProductSpider(scrapy.Spider):
name = "product_spider"
allowed_domains = ['visit.rio']
start_urls = ['http://visit.rio/en/o-que-fazer/outdoors']
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self, response):
self.driver.get(response.url)
while True:
next = self.driver.find_element_by_xpath('//div[@id="show_more"]/a')
try:
next.click()
# get the data and write it to scrapy items
except:
break
self.driver.close()
Upvotes: 10