Reputation: 153
I'm doing a very simple GET request with splash. The splash debug page and using scrapy.Request work fine. When I try to use scrapy_splash.SplashRequest, I get an unrendered page with an empty tag.
Code that works:
class AccountSpider(scrapy.Spider):
name = 'account'
def start_requests(self):
RENDER_HTML_URL = "http://0.0.0.0:8050/render.html?url={url}&wait=2"
url = 'redacted'
headers = Headers({'Content-Type': 'application/json'})
yield scrapy.Request(
RENDER_HTML_URL.format(url=url),
self.login,
method="GET",
headers=headers
)
Code that doesn't work:
class AccountSpider(scrapy.Spider):
name = 'account'
def start_requests(self):
yield scrapy_splash.SplashRequest(
'redacted',
self.login,
endpoint='render.html',
args={
'wait': 2,
},
)
settings.py has the default setup as suggested by the scrapy-splash project
here's my SPLASH_URL setting:
SPLASH_URL = 'http://0.0.0.0:8050'
Upvotes: 0
Views: 731
Reputation: 153
Using the output of Splash server, I determined that SplashRequest "magically" sets headers on your request. User-Agent included.
Changing my code to the following fixes my problem:
class AccountSpider(scrapy.Spider):
name = 'account'
def start_requests(self):
yield scrapy_splash.SplashRequest(
'redacted',
self.login,
endpoint='render.html',
args={
'wait': 2,
'headers': {}
},
)
Upvotes: 1