Reputation: 21914
I'm learning to use proxies in making requests, but I've run into a big issue, which is primarily that it seems requests
doesn't care if a provided proxy is valid or not. This makes it almost impossible to tell if something is actually working or not and I'm honestly at a loss for what to do. The documentation on proxies provided by requests
is very minimal.
My code grabs a User-Agent string and a proxy from a list like so:
proxy = {"https": "https://%s:%s@%s" % (USERNAME, PASSWORD, random.choice(PROXY_LIST))}
headers = {"User-Agent": random.choice(USER_AGENT_LIST)}
return partial(requests.get, proxies=proxy, headers=headers)
an example of a PROXY_LIST entry: 185.46.87.199:8080
The issue is that I can change the username, change the password, etc... and requests
doesn't seem to notice/care. A large portion of all the requests being sent aren't going through a proxy at all. Is there any way to test proxies? See if a request is actually going through a provided proxy? Really any tools for debugging and/or fixing this would be immensely appreciated.
After suggestion by larsks, changed the logging level to DEBUG and got the following output:
INFO:requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): mobile.twitter.com
DEBUG:requests.packages.urllib3.connectionpool:"GET /motivesbylorenr HTTP/1.1" 404 1318
unchanged whether auth is correct or incorrect, and no mention of proxy in the debug information. Again, requests are going through my local IP.
Upvotes: 0
Views: 702
Reputation: 311258
Requests logs debugging information at the DEBUG
priority, so if you enable debug logging via the logging
module you can see a variety of diagnostics. For example:
>>> import logging
>>> logging.basicConfig(level='DEBUG')
With that in place, I can set run:
>>> import requests
>>> s = requests.Session()
>>> s.headers={'user-agent': 'my-test-script'}
>>> s.proxies={'http': 'http://localhost:8123',
... 'https': 'http://localhost:8123'}
>>> s.get('http://mentos.com')
And see:
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:requests.packages.urllib3.connectionpool:"GET http://mentos.com/ HTTP/1.1" 301 0
DEBUG:requests.packages.urllib3.connectionpool:"GET http://us.mentos.com HTTP/1.1" 200 32160
<Response [200]>
That clearly shows the connection to the proxy.
This is hopefully enough to get you started. I'm using a Session
here, but your solution using partial
would behave similarly.
Compare the above output to the log message when requests
is not using a proxy:
>>> requests.get('http://mentos.com')
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): mentos.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 301 0
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): us.mentos.com
DEBUG:requests.packages.urllib3.connectionpool:"GET / HTTP/1.1" 200 10566
<Response [200]>
Here, we see the initial connection opened to the remote site, rather
than the proxy, and the GET
requests do not include the hostname.
Update
The above, with HTTPS URLs:
>>> response = s.get('https://google.com')
>>> response
<Response [200]>
Note that I am setting both the http
and https
keys in the proxies
dictionary.
Upvotes: 3