Reputation: 5277
I tried through several means, but nowhere do I find a satisfatory answer to this -
What are the differences between Python "requests" module and Linux "curl" command? Does "requests" use "curl" underlying, or is it totally different way of dealing with HTTP request/response?
For most of the requests, they both behave in the same way (as it should be), but sometimes, I find a difference in response and it is really hard to figure out why is it so.
eg. Using curl
for HEAD
request:
curl --head https://historia.sherpadesk.com
HTTP/2 302
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:30 GMT
access-control-expose-headers: Request-Context
cache-control: private
location: /login/?ref=portal
set-cookie: ASP.NET_SessionId=nghpw4qp5cw2ntwmwfuxw3oi; path=/; HttpOnly; SameSite=Lax
content-length: 135
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000
and if I use -L
to follow redirects,
curl --head https://historia.sherpadesk.com -L
HTTP/2 302
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:37 GMT
access-control-expose-headers: Request-Context
cache-control: private
location: /login/?ref=portal
set-cookie: ASP.NET_SessionId=trzp0bql4nibswux5z5wfayy; path=/; HttpOnly; SameSite=Lax
content-length: 135
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000
HTTP/2 302
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:38 GMT
access-control-expose-headers: Request-Context
location: https://app.sherpadesk.com/login/?ref=portal
content-length: 161
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000
HTTP/2 200
content-type: text/html; charset=utf-8
date: Mon, 28 Feb 2022 20:31:39 GMT
access-control-expose-headers: Request-Context
cache-control: no-store, no-cache
expires: -1
pragma: no-cache
set-cookie: ASP.NET_SessionId=aqmnxu2s3qkri3sravsrs1cq; path=/; HttpOnly; SameSite=Lax
content-length: 8935
request-context: appId=cid-v1:d5f9900e-ecd4-442f-9e92-e11b4cdbc0c9
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000
and here is the (debug) output when I use Python's requests module requests.head(url)
:
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): historia.sherpadesk.com:443
send: b'HEAD / HTTP/1.1\r\nHost: historia.sherpadesk.com\r\nUser-Agent: python-requests/2.26.0\r\nAccept-Encoding: gzip, deflate, br\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden: Access is denied.\r\n'
header: Content-Length: 58
header: Content-Type: text/html
header: Date: Mon, 28 Feb 2022 20:36:18 GMT
header: X-Frame-Options: SAMEORIGIN
header: X-XSS-Protection: 1
header: X-Content-Type-Options: nosniff
header: Strict-Transport-Security: max-age=31536000
DEBUG:urllib3.connectionpool:https://historia.sherpadesk.com:443 "HEAD / HTTP/1.1" 403 0
INFO:root:URL: https://historia.sherpadesk.com/
INFO:root:<Response [403]>
which just results in 403
response code. Response is same whether allow_redirects
is True/False
. I have also tried using proxy with python code, as I thought maybe its getting blocked as this URL might be recognising Python's request to be a bot, but that also fails. Also, if that was the case, why does curl succeed?
So, my main question here is: what are the major differences between curl and requests, which might cause difference in responses in certain cases?
If possible, I would really like thorough explanation which could help me debug and resolve these issues.
Upvotes: 3
Views: 1943
Reputation: 69937
The two libraries are different but the problem here is related to user agent.
When I try with curl, specifying the python-requests
user agent:
$ curl --head -A "python-requests/2.26.0" https://historia.sherpadesk.com/
HTTP/2 403
content-type: text/html
date: Mon, 28 Feb 2022 22:30:02 GMT
content-length: 58
x-frame-options: SAMEORIGIN
x-xss-protection: 1
x-content-type-options: nosniff
strict-transport-security: max-age=31536000
With curl default user agent:
$ curl --head https://historia.sherpadesk.com/
HTTP/2 302
...
Apparently, they have some type of website security that is blocking HTTP clients like python-requests, but not curl for some reason.
Upvotes: 4