Rafi Ramadhan
Rafi Ramadhan

Reputation: 129

Scraping result is different from inspected DOM element

I want to parse list of price in a web page using Selenium webdriver in Python. So, I try to fetch all the DOM elements using this code

url = 'https://www.google.com/flights/explore/#explore;f=BDO;t=r-Asia-0x88d9b427c383bc81%253A0xb947211a2643e5ac;li=0;lx=2;d=2018-01-09'
driver = webdriver.Chrome()
driver.get(url)

print(driver.page_source)

The problem is what I got from page_source is different from what I see in the inspected element

<div class="CTPFVNB-f-a">
    <div class="CTPFVNB-f-c"></div>
    <div class="CTPFVNB-f-d elt="toolbelt"></div>
    <div class="CTPFVNB-f-e" elt="result">Here is the difference</div>
</div>

The difference exist inside the CTPFVNB-f-e class. In the inspected DOM element, this tag hold all the prices that I want to fetch. But, in the result of page_source, this part is missing.

Could anyone tell me what is wrong with my code? Or do I need further steps to parse the list of prices?

Upvotes: 1

Views: 455

Answers (1)

Keyur Potdar
Keyur Potdar

Reputation: 7238

JavaScript is modifying the page after the page loads. As you are printing page source immediately after opening the page, you're getting the initial code without the execution of JavaScript.

You can do any one of the following things:

  • Add delay: Using time.sleep(x) (change value of x according to your requirements. it is in seconds) (NOT recommended)
  • Implicit wait: driver.implicitly_wait(x) (again x is same as above)
  • Explicit wait: Wait for the HTML element to appear and then get the page source. To learn how to do this, refer this link. (HIGHLY recommended)

Using explicit wait is the better option here as it waits only for the time required for the element to become visible. Thus won't cause any excess delays. Or if the page loads slower than expected, you won't get the desired output using implicit wait.

Upvotes: 2

Related Questions