Slater Victoroff
Slater Victoroff

Reputation: 21914

Requests ignores invalid proxy

I'm learning to use proxies in making requests, but I've run into a big issue, which is primarily that it seems requests doesn't care if a provided proxy is valid or not. This makes it almost impossible to tell if something is actually working or not and I'm honestly at a loss for what to do. The documentation on proxies provided by requests is very minimal.

My code grabs a User-Agent string and a proxy from a list like so:

proxy = {"https": "https://%s:%s@%s" % (USERNAME, PASSWORD, random.choice(PROXY_LIST))}
headers = {"User-Agent": random.choice(USER_AGENT_LIST)}
return partial(requests.get, proxies=proxy, headers=headers)

an example of a PROXY_LIST entry: 185.46.87.199:8080

The issue is that I can change the username, change the password, etc... and requests doesn't seem to notice/care. A large portion of all the requests being sent aren't going through a proxy at all. Is there any way to test proxies? See if a request is actually going through a provided proxy? Really any tools for debugging and/or fixing this would be immensely appreciated.

After suggestion by larsks, changed the logging level to DEBUG and got the following output:

INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): mobile.twitter.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /motivesbylorenr HTTP/1.1" 404 1318

unchanged whether auth is correct or incorrect, and no mention of proxy in the debug information. Again, requests are going through my local IP.

Upvotes: 0

Views: 702

Answers (1)

larsks
larsks

Reputation: 311258

Requests logs debugging information at the DEBUG priority, so if you enable debug logging via the logging module you can see a variety of diagnostics. For example:

>>> import logging
>>> logging.basicConfig(level='DEBUG')

With that in place, I can set run:

>>> import requests
>>> s = requests.Session()
>>> s.headers={'user-agent': 'my-test-script'}
>>> s.proxies={'http': 'http://localhost:8123',
... 'https': 'http://localhost:8123'}
>>> s.get('http://mentos.com')

And see:

INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:requests.packages.urllib3.connectionpool:"GET http://mentos.com/ HTTP/1.1" 301 0
DEBUG:requests.packages.urllib3.connectionpool:"GET http://us.mentos.com HTTP/1.1" 200 32160
<Response [200]>

That clearly shows the connection to the proxy.

This is hopefully enough to get you started. I'm using a Session here, but your solution using partial would behave similarly.

Compare the above output to the log message when requests is not using a proxy:

>>> requests.get('http://mentos.com')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mentos.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 301 0
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): us.mentos.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 10566
<Response [200]>

Here, we see the initial connection opened to the remote site, rather than the proxy, and the GET requests do not include the hostname.

Update

The above, with HTTPS URLs:

>>> response = s.get('https://google.com')
>>> response
<Response [200]>

Note that I am setting both the http and https keys in the proxies dictionary.

Upvotes: 3

Related Questions