Reputation: 172
good night. I'm trying to access https://www.continente.pt/ and all I get it's a blank page with a black bar at the top. I'm using already those options:
url = 'https://www.continente.pt/'
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'D:\doc\Fiverr\newMercado\chromedriver.exe')
driver.get(url)
Doesn't work, I still blocked from load the content.
Upvotes: 1
Views: 2162
Reputation: 172
Well, I found out the answer by uninstalling all chrome based browsers and all components. Then I installed Opera (with 86 Chrome) and downloaded ChromeDriver 86 too. After that, I got access and didn't get block YET (already tried to access the site +10 times and still connecting without problem).
I didn't add any new code, just that:
from selenium import webdriver
url = "https://www.website.com"
driver = webdriver.Chrome()
driver.get(url)
Upvotes: 1
Reputation: 111
Websites have different rules for spiders, mostly summarized through the domain's robots.txt file. Seeing through https://www.continente.pt/robots.txt, here is the output:
User-agent: *
Disallow: */private
Disallow: */search
This might suggest that the website owners don't want anyone scraping on them. Depending on your script, and depending on the website, they may also block access to spiders. You can also check with a different webdriver, maybe Firefox.
You can also check if your IP address is blocked. If that is the case, either try to reset your router if it has dynamic IP addressing, or find a rotating IP provider to use with your script.
Upvotes: 1