Shanshan
Shanshan

Reputation: 11

How to bypass disclaimer while scraping a website

I was able to scrape the following website before using "driver = webdriver.PhantomJS()" for work reason. What I was scraping were the price and the date.

https://www.cash.ch/fonds/swisscanto-ast-avant-bvg-portfolio-45-p-19225268/swc/chf

This stopped working some days ago due to a disclaimer page which I have to agree at first.

https://www.cash.ch/fonds-investor-disclaimer?redirect=fonds/swisscanto-ast-avant-bvg-portfolio-45-p-19225268/swc/chf

Once agreed I visually saw the real content, however the driver seems not, print out is [], so it must be still with the url of the disclaimer.

Please see code below.

    from selenium import webdriver
    from bs4 import BeautifulSoup
    import csv
    import os

    driver = webdriver.PhantomJS()
    driver.set_window_size(1120, 550)

    #Swisscanto
    driver.get("https://www.cash.ch/fonds/swisscanto-ast-avant-bvg-       portfolio-45-p-19225268/swc/chf")
    s_swisscanto = BeautifulSoup(driver.page_source, 'lxml')
    nav_sc = s_swisscanto.find_all('span', {"data-field-entry": "value"})
    date_sc = s_swisscanto.find_all('span', {"data-field-entry": "datetime"})

    print(nav_sc)
    print(date_sc)
    print("Done Swisscanton")

Upvotes: 1

Views: 1079

Answers (1)

whieronymus
whieronymus

Reputation: 301

This should work (I think the button you want to click in zustimmen?)

driver = webdriver.PhantomJS()
driver.get("https://www.cash.ch/fonds/swisscanto-ast-avant-bvg-portfolio-45-p-19225268/swc/chf"

accept_button = driver.find_element_by_link_text('zustimmen')
accept_button.click()

content = driver.page_source

More details here python selenium click on button

Upvotes: 2

Related Questions