Reputation: 689
I'm trying to scrap dynamic content from a Blog through Selenium but it always returns un rendered JavaScript.
To test this behavior I tried to wait till iframe loads completely and printed it's content which prints fine but again when I move back to parent frame it just displays un rendered JavaScript.
I'm looking for something in which I'm able to print completely rendered HTML content
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
driver = webdriver.Chrome("path to chrome driver")
driver.get('http://justgivemechocolateandnobodygetshurt.blogspot.com/')
WebDriverWait(driver, 40).until(expected_conditions.frame_to_be_available_and_switch_to_it((By.ID, "navbar-iframe")))
# Rendered iframe HTML is printed.
content = driver.page_source
print content.encode("utf-8")
# When I switch back to parent frame it again prints non rendered JavaScript.
driver.switch_to.parent_frame()
content = driver.page_source
print content.encode("utf-8")
Upvotes: 2
Views: 5294
Reputation: 474161
The problem is - the .page_source
works only in the current context. There is that "current top-level browsing context" notation..Meaning, if you would call it on a default content - you would not get the inner HTML of the child iframe
elements - for that you would have to switch into the context of a frame
and call .page_source
.
In other words, to get the very complete HTML of the page including the page source of the iframes, you would have to switch into the iframe contexts one by one and get the sources separately.
See also:
Old answer:
I would wait for at least one blog entry to be loaded before getting the page_source
:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 40)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".entry-content")))
print(driver.page_source)
Upvotes: 4