Truub
Truub

Reputation: 107

Python Requests library with proxies - Get request still send my own IP

I am trying to do some web-scraping for a project for my study. Unfortunately I need to try and scrape some data of Google Scholar which blocks my requests. I have tried using (multiple) http proxies but my requests still get blocked after ~300 tries.

The resulting html from the blocked requests contains:

 IP address: 145.109...<br/>Time: 2016-05-05T09:23:37Z<br/>URL: 
 https://scholar.google.nl/citations?hl=en&amp;view_op=search_authors
 &amp;mauthors=Perry<br/>

The above IP is my own, while my proxies dict (it selects a proxy from a list at random) and get request look like this:

proxies = {'http': 'http://<username>:<password>@107.182....:<port>'}

result = requests.get('https://scholar.google.nl/citations?hl=en&         
                      amp;view_op=search_authors&amp;mauthors=Perry',
                      proxies=proxies, headers=headers)

The IPs of are of course valid and work and my own ip is not included in the proxy list. Am I doing something wrong?

Edit: For completeness, i have also tried setting authentication like this answer suggests but the result is the same.

Upvotes: 0

Views: 3216

Answers (1)

mata
mata

Reputation: 69082

In your proxies dict the url scheme doesn't match the one you're using for your request, you use a http entry for your proxies but then make a https request. If you ad a proxy for the https scheme, then it should work.

Upvotes: 2

Related Questions