Kunwar Sodhi
Kunwar Sodhi

Reputation: 223

Checking proxies in python/selenium

I am trying to use selenium and python to connect to a site. I then decided to add proxies to the whole process and use them to connect to the site. The proxies are scraped using Proxy Broker (python module). So I can get a lot of proxies scraped. I then save it to a text file and when I need to connect I randomly choose it from the text file. However here is where the problem arises. When connecting to the site the proxies sometimes don't work. Currently this is the code I am using:

        import asyncio
    from proxybroker import Broker


    async def save(proxies, filename):
        """Save proxies to a file."""
        with open(filename, 'w') as f:
            while True:
                proxy = await proxies.get()
                if proxy is None:
                    break
                proto = 'https' if 'HTTPS' in proxy.types else 'http'
                row = '%s://%s:%d\n' % (proto, proxy.host, proxy.port)
                f.write(row)


    def main():
        proxies = asyncio.Queue()
        broker = Broker(proxies)
        tasks = asyncio.gather(broker.find(types=['HTTP', 'HTTPS'], limit=5),
                            save(proxies, filename='proxies.txt'))
        loop = asyncio.get_event_loop()
        loop.run_until_complete(tasks)


    if __name__ == '__main__':
        main()

    lines = open('proxies.txt').read().splitlines()
    rproxy =random.choice(lines)
    PROXY =  rproxy

This code is the example code from the Proxy Broker example page (https://proxybroker.readthedocs.io/en/latest/examples.html)

So what I want to be able to do is 1 of two things:

Option 1: Check the proxies right after they are scraped and then save the proxies that work in a text file and call them later

Option 2: Check the proxies right before it connects to the site. So it checks if the proxy works and then if it does it uses it. If it doesn't it tries with another one.

I don't really have a clue on how to do this. One thing my friend suggested was to use requests and see if the proxy works, but I'm having problems with that because I can't format the proxy list to be used with requests automatically.

Any help/tips are much appreciated. Thanks in advance!!!!

(Edit) I have already tried posts such as these:

Proxy Check in python

https://github.com/ApsOps/proxy-checker

https://www.calazan.com/how-to-use-proxies-with-an-http-session-using-the-python-requests-package/

https://codereview.stackexchange.com/questions/169246/python-proxy-checker-scanner

None of them worked for me :(

Upvotes: 2

Views: 3505

Answers (1)

Federico Rubbi
Federico Rubbi

Reputation: 734

Well, I gave a look to the proxybroker documentation and I found that the best solution is to check built-in attribute proxy.is_working:

results = []
for proto in ngtrs:
    if proto == 'CONNECT:25':
        result = await self._check_conn_25(proxy, proto)
    else:
        result = await self._check(proxy, proto)
     results.append(result)

proxy.is_working = True if any(results) else False

You can implement it in your code like so:

import asyncio
from proxybroker import Broker


def get_random_proxy():
    """
    Get random proxy from 'proxies.txt'.
    """
    lines = open('proxies.txt').read().splitlines()
    rproxy =random.choice(lines)
    PROXY =  rproxy


async def save(proxies, filename):
    """
    Save proxies to a file.
    """
    with open(filename, 'w') as file:
        while True:
            proxy = await proxies.get()
            if proxy is None:
                break
            # Check accurately if the proxy is working.
            if proxy.is_working:
                protocol = 'https' if 'HTTPS' in proxy.types else 'http'
                line = '{protocol}://{proxy.host}:{proxy.port}\n'
                file.write(line)


def main():
    proxies = asyncio.Queue()
    broker = Broker(proxies)
    tasks = asyncio.gather(broker.find(types=['HTTP', 'HTTPS'], limit=5),
                           save(proxies, filename='proxies.txt'))
    loop = asyncio.get_event_loop()
    loop.run_until_complete(tasks)


if __name__ == '__main__':
    main()

Upvotes: 3

Related Questions