Dayne Jones
Dayne Jones

Reputation: 153

Scrapy Request works but not SplashRequest

I'm doing a very simple GET request with splash. The splash debug page and using scrapy.Request work fine. When I try to use scrapy_splash.SplashRequest, I get an unrendered page with an empty tag.

Code that works:

class AccountSpider(scrapy.Spider):
    name = 'account'

    def start_requests(self):
        RENDER_HTML_URL = "http://0.0.0.0:8050/render.html?url={url}&wait=2"
        url = 'redacted'
        headers = Headers({'Content-Type': 'application/json'})
        yield scrapy.Request(
            RENDER_HTML_URL.format(url=url),
            self.login,
            method="GET",
            headers=headers
        )

Code that doesn't work:

class AccountSpider(scrapy.Spider):
    name = 'account'

    def start_requests(self):
        yield scrapy_splash.SplashRequest(
            'redacted',
            self.login,
            endpoint='render.html',
            args={
                'wait': 2,
            },
        )

settings.py has the default setup as suggested by the scrapy-splash project

here's my SPLASH_URL setting:

SPLASH_URL = 'http://0.0.0.0:8050'

Upvotes: 0

Views: 731

Answers (1)

Dayne Jones
Dayne Jones

Reputation: 153

Using the output of Splash server, I determined that SplashRequest "magically" sets headers on your request. User-Agent included.

Changing my code to the following fixes my problem:

class AccountSpider(scrapy.Spider):
    name = 'account'

    def start_requests(self):
        yield scrapy_splash.SplashRequest(
            'redacted',
            self.login,
            endpoint='render.html',
            args={
                'wait': 2,
                'headers': {}
            },
        )

Upvotes: 1

Related Questions