Reputation: 1
I am trying to extract the contents of some of the news articles. Some of the urls required logging in in order to access the full content. I decided to use selenium to automate logging in. However, I am not able to extract contents because the first url takes forever to load and never reaches the point where actual text extraction is done. It ends up throwing timeout exception.
Here is my code
for url in url_list:
chrome_options = Options()
ua = UserAgent()
userAgent = ua.random
options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome(ChromeDriverManager().install(), options = chrome_options)
driver.get(url)
time.sleep(5)
frame = driver.find_elements_by_xpath('//iframe[@id="wallIframe"]')
#Some articles require going through a paywall and some don't
if len(frame)==0:
text_element = driver.find_elements_by_xpath('//section[@id="main-content"]//article//p')
text = " ".join(x.text for x in element)
else:
text = log_in(frame)
driver.quit()
Although the code never reaches to it, here is my log_in method
def log_in(frame):
driver.switch_to.frame(frame[0])
driver.find_element_by_id("PAYWALL_V2_SIGN_IN").click()
time.sleep(2)
driver.find_elements_by_id("username")[0].send_keys(username)
time.sleep(2)
driver.find_elements_by_xpath('//button[text()="Continue"]')[0].click()
time.sleep(1)
driver.find_elements_by_id("password")[0].send_keys(password)
time.sleep(1)
element = driver.find_elements_by_xpath('//button[@type="submit"]')[0].click()
time.sleep(1)
text = parse_text(element)
How can I get around this?
Upvotes: 0
Views: 118
Reputation: 1331
Instead of manually setting the timeout with time.sleep
, you should use WebDriverWait
along with expected_conditions
; this way the action to be done on your element will be performed only when a certain condition is satisfied (for example if the element is visible or if the element is clickable).
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
try:
frame = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.XPATH, '//iframe[@id="wallIframe"]')))
except TimeoutException:
print "Element not found."
Upvotes: 1