Bryan Torres
Bryan Torres

Reputation: 101

How to check if a web page's content has been changed using Selenium's webdriver with Python?

Comparing old_page_source with new_page_source at time intervals of 20 seconds has been unsuccessful for me.

# using google chrome as my browser
driver = webdriver.Chrome('chromedriverfilepath')

# 5 trials to see how often page gets updated. Currently unsuccesful
for x in range(1, 5):
    # the webpage being analyzed
    driver.get("www.somewebsite.com")

    old_page_source = driver.page_source

    print time.strftime("\n\nTRIAL %d" % x + " ,first page fetched at time...." + 'Time: %H:%M:%S')

    driver.get("www.somewebsite.com")
    new_page_source = driver.page_source

    # keep checking every 20 seconds until page is updated/changed
    while old_page_source == new_page_source:
        sleep(20)
        driver.get("www.somewebsite.com")
        new_page_source = driver.page_source

print "page was changed at time.... " + time.strftime('Time: %H:%M:%S')

Upvotes: 5

Views: 9530

Answers (2)

Louis
Louis

Reputation: 151391

You cannot rely on page_source for what you are doing. What Selenium will report is most likely going to be what the browser first received. As the docs mention:

Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist's impression.

(Emphasis mine. The doc is for the Java bindings but the behavior is not determined by the Java bindings but by the part of Selenium that lives browser-side. So this applies to the Python bindings too.)

What you should be doing to get the actual state of the page is:

driver.execute_script("return document.documentElement.outerHTML")

This will give you a serialization of the DOM tree of the entire page.

Upvotes: 2

jmunsch
jmunsch

Reputation: 24089

If you are looking to only compare textual differences you could grab the text from the body tag. Since the source page may change every time it is loaded and will never enter the while loop. (eg session based information)

body = driver.find_element_by_tag_name("body")
original = body.text
newer = original
while original == newer:
    driver.get("www.somewebsite.com")
    body = driver.find_element_by_tag_name("body")
    newer = body.text
    time.sleep(20)

Upvotes: 1

Related Questions