Roman
Roman

Reputation: 131058

Why python requests get a 404 error?

I try to use requests library to get a content from an URL. In more details, I do it in the following way:

import requests

proxies = {'http':'my_proxy.blabla.com/'}
r = requests.get(url, proxies = proxies)
print r.text

As a result I get the following:

<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>404 - Not Found</title>
 </head>
 <body>
  <h1>404 - Not Found</h1>
 </body>
</html>

So, it looks like the proxy let me go and I reached the server. However, the web server was unable to interpret my request (wrong path or so) and did not know what content to return. Do I interpret it correctly?

What can be the reason for that? I do get the expected content if I put the URL in one of my browsers.

ADDED

It has been suggested in the comments that the root of the problem is in the headers. So, I used this web site: http://www.procato.com/my+headers/ to find out what headers are sent by my browser. I used these values to set the headers variable given to the requests.get function. I set the values for the following keys: 'User-Agent', 'Accept', 'Referer', 'Accept-Encoding', 'Accept-Language', 'X-Forwarded-For', 'Cache-Control', 'Connection'. Unfortunately, it does not resolve the problem. I am still getting the same 404 response.

ADDED 2

I have tested my function for tow different URLs and got exactly the same response. So, my previous assumption that the responses (XML that I see) comes from the web-server is probably wrong. It is unlikely that two completely different web-servers (one of them was Google) generate the same responses.

So, now I do not understand where the XML comes from. Can it be that it comes from the proxy server?

Upvotes: 1

Views: 15089

Answers (1)

Mayur Koshti
Mayur Koshti

Reputation: 1852

import requests
URL = 'https://www.blahblah.com'
proxy = {'http': 'http://www.blahblah.com'}
r = requests.get(URL, proxies = proxy)
print r.text

Upvotes: 1

Related Questions