Lunalight
Lunalight

Reputation: 157

bypass cookiewall selenium

I would like to scrape job listings from a Dutch job listings website. However, when I try to open the page with selenium I run into a cookiewall (new GDPR rules). How do I bypass the cookiewall?

import selenium 

#launch url
url = "https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO"

# create a new Firefox session
driver = webdriver.Firefox()
driver.implicitly_wait(30)
driver.get(url)

Edit I tried something

import selenium 
import pickle

url = "https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO"

driver = webdriver.Firefox()
driver.set_page_load_timeout(20)
driver.get(start_url)

pickle.dump(driver.get_cookies() , open("NVBCookies.pkl","wb"))

after that loading the cookies did not work

for cookie in pickle.load(open("NVBCookies.pkl", "rb")):
    driver.add_cookie(cookie)

InvalidCookieDomainException: Message: Cookies may only be set for the current domain (cookiewall.vnumediaonline.nl)

It looks like I don't get the cookies from the cookiewall, correct?

Upvotes: 1

Views: 2541

Answers (2)

Lunalight
Lunalight

Reputation: 157

driver.find_element_by_xpath('//*[@id="form_save"]').click()

ok I made selenium click the accept button. Also fine by me. Not sure if I'll run into cookiewalls later

Upvotes: 1

user9640289
user9640289

Reputation:

Instead of bypassing why don't you write code to check if it's present then accept it otherwise continue with next operation. Please find below code for more details

import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys


class PythonOrgSearch(unittest.TestCase):

    def setUp(self):

        self.driver = webdriver.Chrome(executable_path="C:\\Users\\USER\\Downloads\\New folder (2)\\chromedriver_win32\\chromedriver.exe")

    def test_search_in_python_org(self):
        driver = self.driver
        driver.get("https://www.nationalevacaturebank.nl/vacature/zoeken?query=&location=&distance=city&limit=100&sort=relevance&filters%5BcareerLevel%5D%5B%5D=Starter&filters%5BeducationLevel%5D%5B%5D=MBO")

        elem = driver.find_element_by_xpath("//div[@class='article__button']//button[@id='form_save']")
        elem.click()

    def tearDown(self):
        self.driver.close()

if __name__ == "__main__":
    unittest.main()

Upvotes: 1

Related Questions