Reputation: 46
The site that I am trying to scrape must follow 302 observed by the browser network tools. I've copied the network request as Curl and its working fine, but when I convert it to python requests its just returning 200.
Curl:
curl -v 'https://my-site.com/CardAuthentication.aspx' \
-H 'Connection: keep-alive' \
-H 'Pragma: no-cache' \
-H 'Cache-Control: no-cache' \
-H 'Upgrade-Insecure-Requests: 1' \
-H 'Origin: https://my-site.com' \
-H 'Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryBN2XwBbcAwvRWZzk' \
-H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36' \
-H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' \
-H 'Sec-GPC: 1' \
-H 'Sec-Fetch-Site: same-origin' \
-H 'Sec-Fetch-Mode: navigate' \
-H 'Sec-Fetch-User: ?1' \
-H 'Sec-Fetch-Dest: document' \
-H 'Referer: https://my-site.com/CardAuthentication.aspx' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Cookie: ASP.NET_SessionId=...; TS0192fa71=...; ServiceProvider=UID=...; .ASPXAUTH=...' \
--data-raw $'------WebKitFormBoundaryBN2XwBbcAwvRWZzk\r\ DATA \r\n' \
--compressed
this is returning:
* Trying IP:443...
........
........
< HTTP/1.1 302 Found
< Cache-Control: no-cache
< Pragma: no-cache
< Content-Type: text/html; charset=utf-8
< Expires: -1
< Location: my-site.com/MemberDetails.aspx
< Date: Mon, 07 Feb 2022 09:54:49 GMT
< Content-Length: 61211
< Set-Cookie: TS0192fa71=...; Path=/; Domain=.my-site.com; Secure
Python:
import requests
import logging
logging.basicConfig(level=logging.DEBUG)
cookies = {
"ASP.NET_SessionId": ".....",
"TS0192fa71": ".....",
"ServiceProvider": ".....",
".ASPXAUTH": ".....",
}
headers = {
......
"Content-Type": "multipart/form-data; boundary=----WebKitFormBoundaryBN2XwBbcAwvRWZzk",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36",
"Referer": "https://my-site.com/CardAuthentication.aspx",
"Accept-Language": "en-US,en;q=0.9",
}
data = {
"------WebKitFormBoundaryBN2XwBbcAwvRWZzk\r\nContent-Disposition: form-data; name": '"__EVENTTARGET"\r\n\r\n\r\n-- DATA
}
response = requests.post(
"https://my-site.com/CardAuthentication.aspx",
headers=headers,
cookies=cookies,
data=data,
)
And the return code is 200, with response history empy.
Is there something wrong with requests library or in the way that request library is processing the data? How can I solve this?
Upvotes: 0
Views: 900
Reputation: 36370
Is there something wrong with requests library or in the way that request library is processing the data? How can I solve this?
This is default requests
behavior, you need to set allow_redirects
to False
if you want to not follow in case of 3xx response code, example
import requests
r = requests.get("http://github.com", allow_redirects=False)
print(r.status_code) # 301
If you want to know more read request.request
docs.
Upvotes: 2