FormRequest Scrapy trouble when loggin in

I'm trying to retrieve info from AliceWeb2. For do this, I need to login, but I can't. I put my user, pass and search in the inspect tool from Google Chrome for a page that has the Request Method: POST, as showed below:

Network Tool.

So, using the following code:

from scrapy.item import Item, Field
from scrapy.http import FormRequest
from scrapy.spider import Spider
from scrapy.utils.response import open_in_browser
from scrapy import Request


class AliceWeb2(Spider):
    name = "login"

    # Start on the welcome page
    def start_requests(self):
        return [Request(
                "http://aliceweb.mdic.gov.br//usuario/login/",
                callback=self.parse_welcome)]

    # Post welcome page's first form with the given user/pass
    def parse_welcome(self, response):
        formdata = {'logUser': 'rtadewald',
                    'logPass': '123'}
        return FormRequest.from_response(
            response,
            formdata=formdata)

I got this: Scrapy CMD's bug

Upvotes: 0

Views: 484

Answers (1)

Umair Ayub
Umair Ayub

Reputation: 21281

First of all, use yield instead of return

and whenever you are trying to mimic a request, just open Dev Tools, and copy cURL of that URL, and then go to this https://curl.trillworks.com/ website, and get your Python code, and then re-use that code into Scrapy.

enter image description here

class AliceWeb2(Spider):
    name = "login"

    # Start on the welcome page
    def start_requests(self):
        headers = {
            'Origin': 'http://aliceweb.mdic.gov.br',
            'Accept-Encoding': 'gzip, deflate',
            'Accept-Language': 'en-US,en;q=0.9',
            'Upgrade-Insecure-Requests': '1',
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36',
            'Content-Type': 'application/x-www-form-urlencoded',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
            'Cache-Control': 'max-age=0',
            'Referer': 'http://aliceweb.mdic.gov.br/usuario/recuperar-senha',
            'Connection': 'keep-alive',
            'DNT': '1',
        }

        yield Request('http://aliceweb.mdic.gov.br//usuario/login?tx_usuario=ABC&tx_senha=ABC', callback=self.parse_welcome, headers=headers, data=data, method="POST")

    def parse_welcome(self, response):
        #do something here

EDIT 1: When I observed request in Dev Tools, I saw those parameters were being sent, see FORM DATA in screenshot.

(another tip): Check preserver log so you can see requests sent in previous page in case if the website redirects you to some other page after login.

enter image description here

Upvotes: 0

Related Questions