Craig
Craig

Reputation: 573

Execute inline JavaScript in Scrapy response

I am trying to log into a website with Scrapy, but the response received is an HTML document containing only inline JavaScript. The JS redirects to the page I want to scrape data from. But Scrapy does not execute the JS and therefore doesn't route to the page I want it to.

I use the following code to submit the login form required:

    def parse(self, response):
      request_id =   response.css('input[name="request_id"]::attr(value)').extract_first()
      data = {
          'userid_placeholder': self.login_user,
          'foilautofill': '',
          'password': self.login_pass,
          'request_id': request_id,
          'username': self.login_user[1:]
      }
      yield   scrapy.FormRequest(url='https://www1.up.ac.za/oam/server/auth_cred_submit',   formdata=data,
                               callback=self.print_p)

The print_p callback function is as follows:

def print_p(self, response):
    print(response.text)

I have looked at scrapy-splash but I could not find a way to execute the JS in the response with scrapy-splash.

Upvotes: 4

Views: 3035

Answers (2)

alexxmagpie
alexxmagpie

Reputation: 367

Probably selenium can help you pass this JS.

If you haven't checked it yet you can use some examples like this. If you'll have luck to reach it then you can get page url with:

self.driver.current_url

And scrape it after.

Upvotes: 2

Tomáš Linhart
Tomáš Linhart

Reputation: 10210

I'd suggest using Splash as a rendering service. Personally, I found it more reliable than Selenium. Using scripts, you can instruct it to interact with the page.

Upvotes: 5

Related Questions