Rafi Ramadhan
Rafi Ramadhan

Reputation: 129

Scraping a JavaScript rendered page

I want to extract some data from a Javascript rendered page using Selenium web driver in Python3. I have try several driver, such as Firefox, Chromedriver, and PhantomJS, but always get the same result. Instead of the DOM element, I only got the script.

Here is the snippet of my code

url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018-01-09'
driver = webdriver.Chrome("/var/chromedriver/chromedriver")
driver.implicitly_wait(20)
driver.get(url)

print(driver.page_source)

Do I miss something here ?

Upvotes: 2

Views: 2414

Answers (2)

Mike Zinyoni
Mike Zinyoni

Reputation: 1

use helium a selenium wraper

# pip install helium
import helium, time
url_one = "https://www.vbiz.in/nseoptionchain.html"
browser_one = helium.start_chrome(url_one, headless=True)
seconds = 5
time.sleep(seconds)
html = browser_one.page_source
browser_one.close()

Upvotes: 0

undetected Selenium
undetected Selenium

Reputation: 193088

I don't see any such issues in your code block. I have tried your own script as follows :

from selenium import webdriver

url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018-01-09'
driver = webdriver.Chrome()
driver.get(url)
print(driver.page_source)

I get the following Console Output :

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US">

<head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
  <meta name="deals::gwt:property" content="baseUrl=/flights/explore//static/" />
  <title>Explore flights</title>
  <meta name="description" content="Explore flights" />
  <script src="https://apis.google.com/_/scs/abc-static/_/js/k=gapi.gapi.en.yoTdpQipo6s.O/m=gapi_iframes,googleapis_client,plusone/rt=j/sv=1/d=1/ed=1/am=AAE/rs=AHpOoo9_VhuRoUovwpPPf5LqLZd-dmCnxw/cb=gapi.loaded_0" async=""></script>
  <script language="javascript" type="text/javascript">
    var __JS_ILT__ = new Date();
    .
    .
    . <
    /div></div > < div aria - hidden = "true"
    style = "display: none;" > < div class = "CTPFVNB-l-j CTPFVNB-l-h" > Displayed currencies may differ from the currencies used to purchase flights.– < a href = "https://www.google.com/intl/en/googlefinance/disclaimer/"
    class = "CTPFVNB-l-k" > Disclaimer < /a></div > < /div><div aria-hidden="true" style="display: none;"><div class="CTPFVNB-l-j CTPFVNB-l-h">Showing licensed rail data. – <a href="https:/ / www.google.com / intl / en / help / legalnotices_maps.html " class="
    CTPFVNB - l - k ">Legal Notice</a></div></div><div class="
    CTPFVNB - l - i "><a class="
    CTPFVNB - l - k CTPFVNB - l - j " href="
    https: //www.google.com/intl/en/policies/">Privacy &amp; Terms</a><a class="CTPFVNB-l-k CTPFVNB-l-j" href="https://support.google.com/flights/?hl=en">Help Center</a></div></div></div><iframe id="deals" tabindex="-1" style="position: absolute; width: 0px; height: 0px; border: none; left: -1000px; top: -1000px;">
</iframe><input type="text" id="_bgInput" style="display:none;" /></body></html>

Now, as you can clearly see at the fag end of the page_source there is an iframe. So untill and unless we switch to the iframe you won't be able to find the DOM element you are looking for.

Upvotes: 1

Related Questions