SIM
SIM

Reputation: 22440

Can't make use of proxies in the right way

I've written a script in python to scrape the requested url made through proxy. I used shuffle() within my script to get the proxies randomly. The script is doing fine to some extent. The problem with this script is when it fails to use any valid proxy, It goes for another url because of the loop. How can I rectify my script in such a way so that it will try to use every proxies (if necessary) in the list to get all the urls.

This is my attempt:

import requests
from random import shuffle

url = "https://stackoverflow.com/questions?page={}&sort=newest"

def get_random_proxies():
    proxies = ['35.199.8.64:80', '50.224.173.189:8080', '173.164.26.117:3128']
    shuffle(proxies)
    return iter(proxies)

for link in [url.format(page) for page in range(1,6)]:
    proxy = next(get_random_proxies())
    try:
        response = requests.get(link,proxies={"http": "http://{}".format(proxy) , "https": "http://{}".format(proxy)})
        print(f'{response.url}\n{proxy}\n')
    except Exception:
        print("something went wrong!!" + "\n")
        proxy = next(get_random_proxies_iter())

Output I'm having:

https://stackoverflow.com/questions?page=1&sort=newest
35.199.8.64:80

https://stackoverflow.com/questions?page=2&sort=newest
50.224.173.189:8080

something went wrong!!

https://stackoverflow.com/questions?page=4&sort=newest
50.224.173.189:8080

something went wrong!!

You can see that the two urls 'page=3&sort=newest' and 'page=5&sort=newest' did not respond whereas my two proxies are still working.

Postscript: They are free proxies so I published them intentionally.

Upvotes: 0

Views: 110

Answers (1)

jedwards
jedwards

Reputation: 30200

What about:

def get_random_proxies():
    proxies = ['35.199.8.64:80', '50.224.173.189:8080', '173.164.26.117:3128']
    shuffle(proxies)
    return proxies

for link in [url.format(page) for page in range(1,6)]:
    for proxy in get_random_proxies():
        try:
            response = requests.get(link,proxies={"http":proxy , "https": proxy})
            print(f'{response.url}\n{proxy}\n')
            break  # success, stop trying proxies
        except Exception:
            print("something went wrong!!" + "\n")

I'm not sure what the plan with return(iter(...)) and next(result) was, but a more traditional method would be just to return the list, then loop over some portion of it, as needed. You've already made the list, returning it takes no extra effort.

Upvotes: 2

Related Questions