Reputation: 29
I'm creating a scraper connected to a Telegram bot that collects offers on Amazon. However, I'm running into problems when I try to use a proxy server. The problem I encountered with the code attached below is that it gives me the following error:
File "/home/X/.local/lib/python3.10/site-packages/seleniumwire/webdriver.py", line 308, in __init__
super().__init__(*args, **kwargs)
TypeError: WebDriver.__init__() got an unexpected keyword argument 'desired_capabilities'
Even though I read that desired_capabilities
has been deprecated with the version of Selenium I use (4.11.2), why does it still give me this error?
I attach the code I'm using. I specify that the proxy I use in this case is from ScraperApi, which in the documentation at this link has indicated the code to use for this type of requests.
from seleniumwire import webdriver
def start_selenium():
chromium_options = webdriver.ChromeOptions()
chromium_options.add_argument("--headless")
chromium_options.add_argument("--disable-extensions")
chromium_options.add_argument("--disable-infobars")
chromium_options.add_argument("--disable-notifications")
chromium_options.add_argument("--disable-translate")
chromium_options.add_argument("--incognito")
proxy_options = {
'proxy': {
'http': f'http://scraperapi:os.environ["SCRAPERAPI_KEY"]@proxy-server.scraperapi.com:8001',
'https': f'http://scraperapi:os.environ["SCRAPERAPI_KEY"]@proxy-server.scraperapi.com:8001',
'no_proxy': 'localhost,127.0.0.1'
}
}
chromium_driver = webdriver.Remote(command_executor='http://localhost:4444/wd/hub',
options=chromium_options,
seleniumwire_options=proxy_options)
return chromium_driver
Also, still attached to this question, would it be possible to use my original code to wait for items to load even with the proxy? I also tried with the ScrapeOps Proxy but, although I managed to start it locally, it sent me too many requests in a short time, using the code below:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def get_all_deals_ids():
deals_page = "https://www.amazon.it/......."
selenium_driver = start_selenium()
try:
selenium_driver.get(deals_page)
WebDriverWait(selenium_driver, 60).until(EC.presence_of_element_located((By.CSS_SELECTOR, "a[class*='DealCard']")))
elements_urls = [e.get_attribute("href") for e in selenium_driver.find_elements(By.CSS_SELECTOR, "a[class*='DealCard']")]
deals_urls = []
for url in elements_urls:
if is_product(url):
deals_urls.append(url)
if ('/deal/' in url) or ('/browse/' in url):
deals_urls = deals_urls + get_submenus_urls(url)
product_ids = {}
for i, url in enumerate(deals_urls, start=1):
if extract_product_id(url) is not None and extract_product_id(url) != '':
product_id = extract_product_id(url)
if product_id not in product_ids:
product_ids[product_id] = i
selenium_driver.quit()
return [*product_ids.items()]
except Exception as e:
print(e)
selenium_driver.quit()
return []
Thanks in advance to anyone who can give me some suggestions!
I tried to run the code attached at the beginning with the ScraperApi Proxy, but I always received the desired_capabilities error. I therefore tried with another Proxy, this time with local webdriver instead of remote, but the result was that it made calls to any link it encountered on the deals_page, causing me to needlessly consume precious calls.
Upvotes: 1
Views: 1053
Reputation: 15556
It's failing because the selenium-wire
library is still using desired_capabilities in the Remote()
definition: https://github.com/wkeeling/selenium-wire/blob/master/seleniumwire/webdriver.py#L298
Unfortunately, this issue is not likely to be fixed due to the library being archived: https://github.com/wkeeling/selenium-wire
...which means you'll either need to fork the repo yourself to make that fix, find someone else who already did that, or downgrade your selenium
version to one where desired_capabilities
still exists.
Or maybe you can do what you want to do without using selenium-wire
at all, or just not using Remote()
. Most of selenium-wire
still works with the latest version of selenium
, just not the method you're trying to use.
Upvotes: 0