Reputation: 573
I am trying to log into a website with Scrapy, but the response received is an HTML document containing only inline JavaScript. The JS redirects to the page I want to scrape data from. But Scrapy does not execute the JS and therefore doesn't route to the page I want it to.
I use the following code to submit the login form required:
def parse(self, response):
request_id = response.css('input[name="request_id"]::attr(value)').extract_first()
data = {
'userid_placeholder': self.login_user,
'foilautofill': '',
'password': self.login_pass,
'request_id': request_id,
'username': self.login_user[1:]
}
yield scrapy.FormRequest(url='https://www1.up.ac.za/oam/server/auth_cred_submit', formdata=data,
callback=self.print_p)
The print_p callback function is as follows:
def print_p(self, response):
print(response.text)
I have looked at scrapy-splash but I could not find a way to execute the JS in the response with scrapy-splash.
Upvotes: 4
Views: 3035
Reputation: 367
Probably selenium can help you pass this JS.
If you haven't checked it yet you can use some examples like this. If you'll have luck to reach it then you can get page url with:
self.driver.current_url
And scrape it after.
Upvotes: 2
Reputation: 10210
I'd suggest using Splash as a rendering service. Personally, I found it more reliable than Selenium. Using scripts, you can instruct it to interact with the page.
Upvotes: 5