Sean_Boothby
Sean_Boothby

Reputation: 177

Selenium webdriver will not fully load page (python)

I have been using the selenium webdriver with python in an attempt to try and login to this website Login Page Here

To do this I did the following in python:

from selenium import webdriver 
import bs4 as bs


driver = webdriver.Chrome()
driver.get('https://app.chatra.io/')

I then go on to make an attempt at parsing using Beautiful Soup:

html = driver.execute_script('return document.documentElement.outerHTML')
soup = bs.BeautifulSoup(html, 'html.parser')
print(soup.prettify)

The main issue is that the page never fully loads. When I load the page in a browser on my own, all is fine. However when the selenium webdriver tries to load it, it just seemingly stops halfway.

Any idea why? Any ideas on how to fix it or where to look to learn?

Upvotes: 4

Views: 20003

Answers (2)

undetected Selenium
undetected Selenium

Reputation: 193108

There are several aspects to the issue you are facing as below :

  • As you are trying to take help of BeautifulSoup so if you try to use urlopen from urllib.request the error says it all :

    urllib.error.HTTPError: HTTP Error 403: Forbidden
    

    Which means urllib.request is getting detected and HTTP Error 403: Forbidden is raised. Hence using webdriver from selenium makes sense.

  • Next, when you take help of ChromeDriver and Chrome initially the Website opens and renders. But soon ChromeDriver being a WebDriver is detected and ChromeDriver is unable to parse the <head> & <body> tags. You see the minimal header as :

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" class="supports cssfilters flexwrap chrome webkit win hover web"></html>
    
  • Finally, when you take help of GeckoDriver and Firefox Quantum the Website opens and renders properly as follows :

    Code Block :

    from selenium import webdriver
    from bs4 import BeautifulSoup as soup
    
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get('https://app.chatra.io/')
    html = driver.execute_script('return document.documentElement.outerHTML')
    pagesoup = soup(html, "html.parser")
    print(pagesoup)
    

    Console Output :

    <html class="supports cssfilters flexwrap firefox gecko win hover web"><head>
    <link class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51" rel="stylesheet" type="text/css"/>
    <meta charset="utf-8"/>
    <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
    <meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover" name="viewport"/>
    .
    .
    .
    <em>··· Chatra</em>
    .
    .
    .
    </div></body></html>
    
  • Adding prettify to the soup extraction :

    Code Block :

    from selenium import webdriver
    from bs4 import BeautifulSoup as soup
    
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get('https://app.chatra.io/')
    html = driver.execute_script('return document.documentElement.outerHTML')
    pagesoup = soup(html, "html.parser")
    print(pagesoup.prettify)
    

    Console Output :

    <bound method Tag.prettify of <html class="supports cssfilters flexwrap firefox gecko win hover web"><head>
    <link class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51" rel="stylesheet" type="text/css"/>
    <meta charset="utf-8"/>
    <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
    <meta content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover" name="viewport"/>
    .
    .
    .
    <em>··· Chatra</em>
    .
    .
    .
    </div></body></html>>
    
  • Even you can use Selenium's page_source method as follows :

    Code Block :

    from selenium import webdriver
    
    driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
    driver.get('https://app.chatra.io/')
    print(driver.page_source)
    

    Console Output :

<html class="supports cssfilters flexwrap firefox gecko win hover web">

<head>
  <link rel="stylesheet" type="text/css" class="" href="https://app.chatra.io/b281cc6b75916e26b334b5a05913e3eb18fd3a4d.css?meteor_css_resource=true&amp;_g_app_v_=51">
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1, minimum-scale=1, user-scalable=no, viewport-fit=cover">

  <!-- platform specific stuff -->
  <meta name="msapplication-tap-highlight" content="no">
  <meta name="apple-mobile-web-app-capable" content="yes">

  <!-- favicon -->
  <link rel="shortcut icon" href="/static/favicon.ico">

  <!-- win8 tile -->
  <meta name="msapplication-TileImage" content="/static/win-tile.png">
  <meta name="msapplication-TileColor" content="#ffffff">
  <meta name="application-name" content="Chatra">

  <!-- apple touch icon -->
  <!--<link rel="apple-touch-icon" sizes="256x256" href="/static/?????.png">-->

  <title>··· Chatra</title>

  <style>
    body {
      background: #f6f5f7
    }
  </style>

  <style type="text/css"></style>
</head>

<body>



  <script async="" src="https://www.google-analytics.com/analytics.js"></script>
  <script type="text/javascript" src="/meteor_runtime_config.js"></script>

  <script type="text/javascript" src="https://app.chatra.io/9153feecdc706adbf2c71253473a6aa62c803e45.js?meteor_js_resource=true&amp;_g_app_v_=51"></script>



  <div class="body body-layout">
    <div class="body-layout__main main-layout">
      <aside class="main-layout__left-sidebar">
        <div class="left-sidebar-layout">
        </div>
      </aside>
      <div class="main-layout__content">
        <div class="content-layout">


          <main class="content-layout__main is-no-fades js-popover-boundry js-main">

            <div class="center loading loading--light">
              <div class="content-padding nothing">


                <em>··· Chatra</em>


              </div>
            </div>

          </main>
        </div>
      </div>
    </div>
  </div>
</body>
</html>

Upvotes: 1

alecxe
alecxe

Reputation: 473873

First of all, the issue is also reproducible for me in the latest Chrome (with chromedriver 2.34 - also currently latest) - not yet sure what's happening at the moment. Workaround: Firefox worked for me perfectly.


And, I would add an extra step in between driver.get() and HTML parsing - an explicit wait to let the page properly load until the desired condition would be true:

import bs4 as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get('https://app.chatra.io/')

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.ID, "signin-email")))

html = driver.execute_script('return document.documentElement.outerHTML')
soup = bs.BeautifulSoup(html, 'html.parser')
print(soup.prettify())

Note that you also needed to call prettify() - it's a method.

Upvotes: 3

Related Questions