Vishal Ramesh
Vishal Ramesh

Reputation: 111

Selenium still using previous state of page even after clicking a button on a page. How to update to state of the browser/HTML code?

I am using python to scrape some data from a website in combination with selenium and Beautiful Soup. This page has buttons you can click which change the data displayed in the tables, but this is all handled by the javascript in the page. The page url does not change. Selenium successfully renders the javascript on the page on load, but it continues using the previous state (before the clicks) therefore, scraping the same data instead of the new data.

I tried following the solutions given on Obey The Testing Goat but it always seemed to timeout and not turn the state stale. I've tried waiting for 10 seconds manually by using a time.sleep for it to wait for the state to possibly refresh in a while. I've tried using WebDriverWait to wait until the old page turned stale. I've tried looking through the selenium documentation for possible solutions. The code presented below attempts to use the solution presented in the website, but it simply times out no matter the timeout rate.

from selenium.webdriver.support.wait import WebDriverWait
from contextlib import contextmanager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import staleness_of
class MySeleniumTest():
    # assumes self.browser is a selenium webdriver

    def __init__(self, browser, soup):
        self.browser = browser
        self.soup = soup

    @contextmanager
    def wait_for_page_load(self, timeout=30):
        old_page = self.browser.find_element_by_tag_name('html')
        yield
        WebDriverWait(self.browser, timeout).until(staleness_of(old_page))

    def tryChangingState(self):
        with self.wait_for_page_load(timeout=20):
            og_state = self.soup
            tab = self.browser.find_element_by_link_text('Breakfast')
            tab.click()
            tab = self.browser.find_element_by_link_text('Lunch')
            tab.click()
            new_state = self.soup
            # check if the HTML code has changed
            print(og_state != new_state)
# create tester object
tester = MySeleniumTest(browser, soup)
# try changing state by after clicking on button
tester.tryChangingState()

I'm not sure if I'm using it in the correct way or not. I also tried creating a new with self.wait_for_page_load(timeout=20): after the first click and put the rest of the code within that, but this also did not work. I would expect og_state != new_state to result in true implying the HTML changed, but the actual result is false.

Upvotes: 1

Views: 2051

Answers (1)

Vishal Ramesh
Vishal Ramesh

Reputation: 111

Original poster here. I found the reason for the issue. The state was being updated in selenium but since I was using Beautiful Soup for parsing, the Beautiful Soup object was using the source code from the previous selenium web driver object. But updating the soup object each time the page was clicked, the scraper was able to successfully gather the new data.

I updated the soup object by simply calling soup = BeautifulSoup(browser.page_source, 'lxml')

In other words, I didn't need to worry about the state of the selenium web driver, it was simply an issue of updating the source code the parser was reading.

Upvotes: 1

Related Questions