Using Scrapy-splash to navigate dynamic pages

Question

Background: I am using Scrapy to crawl and scrape product data from http://shop.nordstrom.com/c/mens-tshirts. The page is dynamically generated so I use Scrapy-Splash to deal with the JavaScript. The problem is, clicking the "Next" button on the bottom of the page is the only way to get to the subsequent product page. If you copy that url of page 2 and paste it into a new tab, the page has no products on it.

In order to combat this, I am trying to use the .click() function in Selenium to navigate to the next page, and driver.page_source to extract the html of the page.

Question: Is there a way to pass the html/javascript source that I extract into Splash (running inside a docker container), rather than passing in a url? I've tried saving the html on my local machine and passing the file path, but that results in a 502 Bad Gateway because Splash automatically prepends 'http://' to the path.

Maybe there's a better method for achieving my goal here, if so I'm open to any options. Please keep in mind that the solution must be appropriate for scalability and cloud deployment. Thanks!

Using Scrapy-splash to navigate dynamic pages

Answers (1)

Related Questions