Python Requests library with proxies - Get request still send my own IP

Question

I am trying to do some web-scraping for a project for my study. Unfortunately I need to try and scrape some data of Google Scholar which blocks my requests. I have tried using (multiple) http proxies but my requests still get blocked after ~300 tries.

The resulting html from the blocked requests contains:

 IP address: 145.109...
Time: 2016-05-05T09:23:37Z
URL: 
 https://scholar.google.nl/citations?hl=en&view_op=search_authors
 &mauthors=Perry

The above IP is my own, while my proxies dict (it selects a proxy from a list at random) and get request look like this:

proxies = {'http': 'http://:@107.182....:'}

result = requests.get('https://scholar.google.nl/citations?hl=en&         
                      amp;view_op=search_authors&mauthors=Perry',
                      proxies=proxies, headers=headers)

The IPs of are of course valid and work and my own ip is not included in the proxy list. Am I doing something wrong?

Edit: For completeness, i have also tried setting authentication like this answer suggests but the result is the same.

mata · Accepted Answer

In your proxies dict the url scheme doesn't match the one you're using for your request, you use a http entry for your proxies but then make a https request. If you ad a proxy for the https scheme, then it should work.

Python Requests library with proxies - Get request still send my own IP

Answers (1)

Related Questions