Reputation: 178
Not much to say, I'm sending a request to some url. One HTTP Client returns 403, one succeds. No difference in headers. What else can be different?
>>> print(httpx.get(url).status_code)
200
>>> print(requests.get(url).status_code)
403
Upvotes: 14
Views: 15102
Reputation: 75
It could be that the site here doesnt support http1.0 requests and only accepts http2.0 requests.
httpx
supports http2.0
Upvotes: 5
Reputation: 383
My guess would be that the website is blocking requests from requests library.
With each request you are leaving behind fingerprint of device that made it. This fingerprint amongst other information contains user-agent.
When you are doing requests with requests library by default user agent value is set to something like "python-requests/2.28.0" while for httpx library it's something like "python-httpx/0.23.0".
Based on information contained in your fingerprint website might choose to handle your request differently from others. For example returning error page instead of real content.
Python is often used for web scraping and requests is one of the more popular libraries. Many people will choose to use it in their project. Because of that many websites choose to not serve real content to requests with "python-requests/*" user agent.
Because handling of requests costs money, websites are trying to serve content only to real people instead of bots. Bot traffic also tends to be way bigger compared to real people. While some people just want to forbid 3rd parties to use their content in the ways they don't like or know about.
Httpx library is doing the same but it's not as wide-known as requests library so you'll find more websites which are not blocking such requests.
Yes, there are many online resources about the topic. Just search for something like "avoid requests blocking".
Upvotes: 13