Reputation:
I am trying to archive a site via scraping (API is not an option), and am using proxies to bypass the rate limit on the site.
I am using the python requests library, and using a list of proxies stored in proxies.txt
, which contain working proxies
I am using a class called ProxyManager to handle rotating proxies and verifying them
ProxyManager.py
class ProxyManager:
def __init__(self) -> None:
self._proxie_data = []
self._verified_proxies = []
self._current_proxy_index = 0
with open('./proxies.txt', "r") as file:
self._verified_proxies = file.read().strip().split("\n")
self._filter_proxies(debug=True)
def _filter_proxies(self, debug=False) -> None:
new_proxies = []
total_proxies = len(self._proxie_data)
if debug:
for proxy in self._proxie_data:
new_proxies.append({'http': proxy})
self._verified_proxies = new_proxies
self._current_proxy_index = random.randint(0, len(self._verified_proxies) - 1)
return
def get_current_proxy(self):
self._current_proxy_index += 1
if self._current_proxy_index >= len(self._verified_proxies) - 1:
self._current_proxy_index = 0
return self._verified_proxies[self._current_proxy_index]
When doing either of the following:
class Content:
def __init__():
proxy_manager = ProxyManager()
self.sess = requests.Session()
self.sess.get(url, headers=self.headers, proxies=proxy_manager.get_current_proxy())
or
class Content:
def __init__():
proxy_manager = ProxyManager()
self.sess = requests.Session()
self.sess.proxies.update(proxy_manager.get_current_proxy())
My IP address stays the same as my own global IP address, and wont change to the ip address of the proxy. Other similar questions yield the same result
Upvotes: 0
Views: 27