Scrapy with Splash doesn't wait for website to load

Question

I am trying to render and scrape an interactive website by invoking Splash through the Python script, basically following this tutorial:

import scrapy
from scrapy_splash import SplashRequest

class MySpider(scrapy.Spider):
    start_urls = ["http://example.com"]

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url, self.parse,
                endpoint='render.html',
                args={'wait': 0.5},
            )

    def parse(self, response):
        filename = 'mywebsite-%s.html' % '1'
        with open(filename, 'wb') as f:
            f.write(response.body)

The output looks fine, however it's missing a part of the website that is loaded through ajax after a second or two, which is the content I actually need. Now the weird part is, if I access Splash directly inside the container through the web interface, set the same URL, and hit the Render button, the returned response is correct. So, the only question is, why when the Python script invokes it, it doesn't render the website correctly?

Scrapy with Splash doesn't wait for website to load

Answers (1)

Related Questions

Scrapy with Splash doesn&#39;t wait for website to load

Answers (1)

Related Questions

Scrapy with Splash doesn't wait for website to load