liyuhao
liyuhao

Reputation: 375

I want to use proxy to crawl a website, how to judge to proxy is still available?

I had a lot free proxies in a txt file, and now I want to use them as proxies to crawl website, but when I use the proxies, like 127.0.0.1 below, how can I judge the proxy is still available to use?

proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')

Upvotes: 0

Views: 1015

Answers (1)

jsanc623
jsanc623

Reputation: 534

Use this function:

def is_OK(ip):
    print 'Trying %s ...' % ip
    try:
        proxy_handler = urllib2.ProxyHandler({'http': ip})
        opener = urllib2.build_opener(proxy_handler)
        opener.addheaders = [('User-agent', 'Mozilla/5.0')]
        urllib2.install_opener(opener)
        req=urllib2.Request('http://www.icanhazip.com')
        urllib2.urlopen(req)
        print '%s is OK' % ip
        return True
    except urllib2.HTTPError:
        print '%s is not OK' % ip
    except Exception:
        print '%s is not OK' % ip
    return False

From this answer: Python, checking if a proxy is alive?

So you'd just iterate over the file (assuming 1 IP address per line) and check if is_OK() returns True:

with open('ip_addresses.txt') as fp:
    for ip in fp:
        if is_OK(ip) is True:
            do_something();

Upvotes: 0

Related Questions