Reputation: 101
Comparing old_page_source with new_page_source at time intervals of 20 seconds has been unsuccessful for me.
# using google chrome as my browser
driver = webdriver.Chrome('chromedriverfilepath')
# 5 trials to see how often page gets updated. Currently unsuccesful
for x in range(1, 5):
# the webpage being analyzed
driver.get("www.somewebsite.com")
old_page_source = driver.page_source
print time.strftime("\n\nTRIAL %d" % x + " ,first page fetched at time...." + 'Time: %H:%M:%S')
driver.get("www.somewebsite.com")
new_page_source = driver.page_source
# keep checking every 20 seconds until page is updated/changed
while old_page_source == new_page_source:
sleep(20)
driver.get("www.somewebsite.com")
new_page_source = driver.page_source
print "page was changed at time.... " + time.strftime('Time: %H:%M:%S')
Upvotes: 5
Views: 9530
Reputation: 151391
You cannot rely on page_source
for what you are doing. What Selenium will report is most likely going to be what the browser first received. As the docs mention:
Get the source of the last loaded page. If the page has been modified after loading (for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist's impression.
(Emphasis mine. The doc is for the Java bindings but the behavior is not determined by the Java bindings but by the part of Selenium that lives browser-side. So this applies to the Python bindings too.)
What you should be doing to get the actual state of the page is:
driver.execute_script("return document.documentElement.outerHTML")
This will give you a serialization of the DOM tree of the entire page.
Upvotes: 2
Reputation: 24089
If you are looking to only compare textual differences you could grab the text from the body tag. Since the source page may change every time it is loaded and will never enter the while loop. (eg session based information)
body = driver.find_element_by_tag_name("body")
original = body.text
newer = original
while original == newer:
driver.get("www.somewebsite.com")
body = driver.find_element_by_tag_name("body")
newer = body.text
time.sleep(20)
Upvotes: 1