Reputation: 131058
I try to use requests
library to get a content from an URL. In more details, I do it in the following way:
import requests
proxies = {'http':'my_proxy.blabla.com/'}
r = requests.get(url, proxies = proxies)
print r.text
As a result I get the following:
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>404 - Not Found</title>
</head>
<body>
<h1>404 - Not Found</h1>
</body>
</html>
So, it looks like the proxy let me go and I reached the server. However, the web server was unable to interpret my request (wrong path or so) and did not know what content to return. Do I interpret it correctly?
What can be the reason for that? I do get the expected content if I put the URL in one of my browsers.
ADDED
It has been suggested in the comments that the root of the problem is in the headers. So, I used this web site: http://www.procato.com/my+headers/ to find out what headers are sent by my browser. I used these values to set the headers
variable given to the requests.get
function. I set the values for the following keys: 'User-Agent', 'Accept', 'Referer', 'Accept-Encoding', 'Accept-Language', 'X-Forwarded-For', 'Cache-Control', 'Connection'. Unfortunately, it does not resolve the problem. I am still getting the same 404 response.
ADDED 2
I have tested my function for tow different URLs and got exactly the same response. So, my previous assumption that the responses (XML that I see) comes from the web-server is probably wrong. It is unlikely that two completely different web-servers (one of them was Google) generate the same responses.
So, now I do not understand where the XML comes from. Can it be that it comes from the proxy server?
Upvotes: 1
Views: 15089
Reputation: 1852
import requests
URL = 'https://www.blahblah.com'
proxy = {'http': 'http://www.blahblah.com'}
r = requests.get(URL, proxies = proxy)
print r.text
Upvotes: 1