Reputation: 223
I am trying to use selenium and python to connect to a site. I then decided to add proxies to the whole process and use them to connect to the site. The proxies are scraped using Proxy Broker (python module). So I can get a lot of proxies scraped. I then save it to a text file and when I need to connect I randomly choose it from the text file. However here is where the problem arises. When connecting to the site the proxies sometimes don't work. Currently this is the code I am using:
import asyncio
from proxybroker import Broker
async def save(proxies, filename):
"""Save proxies to a file."""
with open(filename, 'w') as f:
while True:
proxy = await proxies.get()
if proxy is None:
break
proto = 'https' if 'HTTPS' in proxy.types else 'http'
row = '%s://%s:%d\n' % (proto, proxy.host, proxy.port)
f.write(row)
def main():
proxies = asyncio.Queue()
broker = Broker(proxies)
tasks = asyncio.gather(broker.find(types=['HTTP', 'HTTPS'], limit=5),
save(proxies, filename='proxies.txt'))
loop = asyncio.get_event_loop()
loop.run_until_complete(tasks)
if __name__ == '__main__':
main()
lines = open('proxies.txt').read().splitlines()
rproxy =random.choice(lines)
PROXY = rproxy
This code is the example code from the Proxy Broker example page (https://proxybroker.readthedocs.io/en/latest/examples.html)
So what I want to be able to do is 1 of two things:
Option 1: Check the proxies right after they are scraped and then save the proxies that work in a text file and call them later
Option 2: Check the proxies right before it connects to the site. So it checks if the proxy works and then if it does it uses it. If it doesn't it tries with another one.
I don't really have a clue on how to do this. One thing my friend suggested was to use requests and see if the proxy works, but I'm having problems with that because I can't format the proxy list to be used with requests automatically.
Any help/tips are much appreciated. Thanks in advance!!!!
(Edit) I have already tried posts such as these:
https://github.com/ApsOps/proxy-checker
https://www.calazan.com/how-to-use-proxies-with-an-http-session-using-the-python-requests-package/
https://codereview.stackexchange.com/questions/169246/python-proxy-checker-scanner
None of them worked for me :(
Upvotes: 2
Views: 3505
Reputation: 734
Well, I gave a look to the proxybroker documentation and I found that the best solution is to check built-in attribute proxy.is_working
:
results = []
for proto in ngtrs:
if proto == 'CONNECT:25':
result = await self._check_conn_25(proxy, proto)
else:
result = await self._check(proxy, proto)
results.append(result)
proxy.is_working = True if any(results) else False
You can implement it in your code like so:
import asyncio
from proxybroker import Broker
def get_random_proxy():
"""
Get random proxy from 'proxies.txt'.
"""
lines = open('proxies.txt').read().splitlines()
rproxy =random.choice(lines)
PROXY = rproxy
async def save(proxies, filename):
"""
Save proxies to a file.
"""
with open(filename, 'w') as file:
while True:
proxy = await proxies.get()
if proxy is None:
break
# Check accurately if the proxy is working.
if proxy.is_working:
protocol = 'https' if 'HTTPS' in proxy.types else 'http'
line = '{protocol}://{proxy.host}:{proxy.port}\n'
file.write(line)
def main():
proxies = asyncio.Queue()
broker = Broker(proxies)
tasks = asyncio.gather(broker.find(types=['HTTP', 'HTTPS'], limit=5),
save(proxies, filename='proxies.txt'))
loop = asyncio.get_event_loop()
loop.run_until_complete(tasks)
if __name__ == '__main__':
main()
Upvotes: 3