Ali
Ali

Reputation: 338

Scrapy FormRequest returning 400 error while Python Requests works

Sending a Post request through Scrapy FormRequest results in a 400 error while the same request made through Python Requests is successful.

The request headers and params can't be the problem because they work the Requests. What in Scrapy could be breaking this?

The following code was run inside scrapy shell:

url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html'
headers = {
    'authority': 'www.tripadvisor.co.uk',
    'method': 'POST',
    'scheme': 'https',
    'accept': 'text/html, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'no-cache',
    'content-length': '102',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'dnt': '1',
    'origin': 'https://www.tripadvisor.co.uk',
    'pragma': 'no-cache',
    'sec-ch-ua-mobile': '?0',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
}
params = {
    'returnTo': '#REVIEWS',
    'filterLang': 'ALL',
    'changeSet': 'REVIEW_LIST'
}

Scrapy FormRequst returns a 400 error.

In [10]: req = scrapy.http.FormRequest(
    ...:             url,
    ...:             method='POST',
    ...:             formdata=params,
    ...:             headers=headers)

In [11]: fetch(req)
2021-06-26 21:28:18 [scrapy.core.engine] DEBUG: Crawled (400) <POST https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html> (referer: None)

Python Requests returns a 200 and I can access the content.

In [17]: r = requests.post(url=url, headers=headers, json=params)
2021-06-26 21:30:02 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.tripadvisor.co.uk:443
2021-06-26 21:30:04 [urllib3.connectionpool] DEBUG: https://www.tripadvisor.co.uk:443 "POST /ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html HTTP/1.1" 200 16360

In [18]: r.status_code
Out[18]: 200

Upvotes: 1

Views: 486

Answers (1)

Md. Fazlul Hoque
Md. Fazlul Hoque

Reputation: 16189

As I can't access the url from here,you may try following code whether it works or not.You also have to add user-agent.

import scrapy

class ReviewsSpider(scrapy.Spider):
    name = 'reviews' 
    body = "reqNum=1&isLastPoll=false&paramSeqId=0&waitTime=41&changeSet=REVIEW_LIST&puid=YNgN2QokGScAA0-MH9MAAAIQ"
    def start_requests(self):
        yield scrapy.Request(
            url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r791416821-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html',
            method = "POST",
            body = self.body,
            callback = self.parse,
            headers = {
                'content-type': 'application/x-www-form-urlencoded',
                'x-puid': 'YNgN2QokGScAA0-MH9MAAAIQ',
                'x-requested-with': 'XMLHttpRequest'
               
            }
        )
    def parse(self, response):
        pass

Upvotes: 2

Related Questions