gustavo matteo
gustavo matteo

Reputation: 172

Website Blocking Selenium from access content

good night. I'm trying to access https://www.continente.pt/ and all I get it's a blank page with a black bar at the top. I'm using already those options:

url = 'https://www.continente.pt/'
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'D:\doc\Fiverr\newMercado\chromedriver.exe')
driver.get(url)

Doesn't work, I still blocked from load the content.

Blocked Continente.pt

Upvotes: 1

Views: 2162

Answers (2)

gustavo matteo
gustavo matteo

Reputation: 172

Well, I found out the answer by uninstalling all chrome based browsers and all components. Then I installed Opera (with 86 Chrome) and downloaded ChromeDriver 86 too. After that, I got access and didn't get block YET (already tried to access the site +10 times and still connecting without problem).

I didn't add any new code, just that:

from selenium import webdriver


url = "https://www.website.com"

driver = webdriver.Chrome()


driver.get(url)

Upvotes: 1

Jahziel Rae Arceo
Jahziel Rae Arceo

Reputation: 111

Websites have different rules for spiders, mostly summarized through the domain's robots.txt file. Seeing through https://www.continente.pt/robots.txt, here is the output:

User-agent: *
Disallow: */private
Disallow: */search

This might suggest that the website owners don't want anyone scraping on them. Depending on your script, and depending on the website, they may also block access to spiders. You can also check with a different webdriver, maybe Firefox.

You can also check if your IP address is blocked. If that is the case, either try to reset your router if it has dynamic IP addressing, or find a rotating IP provider to use with your script.

Upvotes: 1

Related Questions