Scrapy FormRequest returning 400 error while Python Requests works

Question

Sending a Post request through Scrapy FormRequest results in a 400 error while the same request made through Python Requests is successful.

The request headers and params can't be the problem because they work the Requests. What in Scrapy could be breaking this?

The following code was run inside scrapy shell:

url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html'
headers = {
    'authority': 'www.tripadvisor.co.uk',
    'method': 'POST',
    'scheme': 'https',
    'accept': 'text/html, */*',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'no-cache',
    'content-length': '102',
    'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'dnt': '1',
    'origin': 'https://www.tripadvisor.co.uk',
    'pragma': 'no-cache',
    'sec-ch-ua-mobile': '?0',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-origin',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36',
    'x-requested-with': 'XMLHttpRequest',
}
params = {
    'returnTo': '#REVIEWS',
    'filterLang': 'ALL',
    'changeSet': 'REVIEW_LIST'
}

Scrapy FormRequst returns a 400 error.

In [10]: req = scrapy.http.FormRequest(
    ...:             url,
    ...:             method='POST',
    ...:             formdata=params,
    ...:             headers=headers)

In [11]: fetch(req)
2021-06-26 21:28:18 [scrapy.core.engine] DEBUG: Crawled (400)  (referer: None)

Python Requests returns a 200 and I can access the content.

In [17]: r = requests.post(url=url, headers=headers, json=params)
2021-06-26 21:30:02 [urllib3.connectionpool] DEBUG: Starting new HTTPS connection (1): www.tripadvisor.co.uk:443
2021-06-26 21:30:04 [urllib3.connectionpool] DEBUG: https://www.tripadvisor.co.uk:443 "POST /ShowUserReviews-g2151208-d19219570-r792748373-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html HTTP/1.1" 200 16360

In [18]: r.status_code
Out[18]: 200

Md. Fazlul Hoque · Accepted Answer

As I can't access the url from here,you may try following code whether it works or not.You also have to add user-agent.

import scrapy

class ReviewsSpider(scrapy.Spider):
    name = 'reviews' 
    body = "reqNum=1&isLastPoll=false¶mSeqId=0&waitTime=41&changeSet=REVIEW_LIST&puid=YNgN2QokGScAA0-MH9MAAAIQ"
    def start_requests(self):
        yield scrapy.Request(
            url = 'https://www.tripadvisor.co.uk/ShowUserReviews-g2151208-d19219570-r791416821-Tumanyan_Khinkali_at_Tsaghkadzor-Tsakhkadzor_Kotayk_Province.html',
            method = "POST",
            body = self.body,
            callback = self.parse,
            headers = {
                'content-type': 'application/x-www-form-urlencoded',
                'x-puid': 'YNgN2QokGScAA0-MH9MAAAIQ',
                'x-requested-with': 'XMLHttpRequest'
               
            }
        )
    def parse(self, response):
        pass

Scrapy FormRequest returning 400 error while Python Requests works

Answers (1)

Related Questions