Reputation: 1929
I am using selenium to scrape a web page, dynamically generated by javascript. It works fine when I make call from cmd(python) terminal directly. But does't work fine when I implemented this functionality in class.
My class Implementation is:
class web_scraper():
def __init__(self):
# start chrome driver
self.driver = webdriver.Chrome(executable_path="./config/chromedriver.exe")
# scrape web page from specified url
def scrape_page(self, url):
html = None
try:
# scrape page
self.driver.get(url)
# read html
html = self.driver.execute_script("return document.documentElement.innerHTML;")
except Exception as e:
print('[Error:] Scrapping failed.')
print(f'[Exception:] {e}')
return html
if __name__ == '__main__':
url = "https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage9"
scraper = web_scraper()
content = scraper.scrape_page(url)
Code, which I used at terminal is:
driver = webdriver.Chrome(executable_path='E:/Projects/Python_Projects/WebScraping/config/chromedriver.exe')
driver.get("https://wipp.edmundsassoc.com/Wipp/?wippid=1205#taxPage30")
content = driver.execute_script("return document.documentElement.innerHTML;")
Output of class implementation is:
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<link type="text/css" rel="stylesheet" href="Wipp.css">
<title>WIPP</title>
<link rel="stylesheet" href="https://wipp.edmundsassoc.com/Wipp/wipp/gwt/standard/standard.css"><script src="https://wipp.edmundsassoc.com/Wipp/wipp/0D3421F8F9508D2F958C63CE2A48BAD8.cache.js"></script></head>
<body>
<script type="text/javascript" language="javascript" src="wipp/wipp.nocache.js"></script>
<iframe src="javascript:''" id="__gwt_historyFrame" tabindex="-1" style="position:absolute;width:0;height:0;border:0"></iframe>
</body>
While in case of commands on python terminal the output is fine.
Any help regarding this would be appreciable. Thanks!
I am using Windows OS and Python version is 3.6.
Upvotes: 0
Views: 110
Reputation: 521
Add time.sleep() after getting url
self.driver.get(url)
time.sleep(10)
Upvotes: 1