Can't find Element with Selenium

Im trying to scrap the price of a flight from the Google Flights website using Selenium but said element does not show up anywhere, not even when scraping the whole page. Ive read that it might be due to it being in a different frame, but how would I know in which frame it is.

Here is the website: https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o

The price I'm looking for is: 32 €

And here is my code:

from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")

d = webdriver.Chrome('/Users/davidgarciaballester/Desktop/chromedriver', options=chrome_options)

url='https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o'
d.get(url)



precios = soup(d.page_source, 'html.parser').findAll('jsl',{'jstcache':'9322'})


print(precios)

d.quit();

Am I missing something? Thanks in advance.

EDIT 1: jstcache changed value to 9322

Upvotes: 1

Views: 1260

Answers (3)

Ok figured out what was going on. I wasn't giving the driver enough time to load the page. Fixed this by stalling for a few seconds after loading the page.

Working code:

from bs4 import BeautifulSoup as soup
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options



d = webdriver.Chrome('C:/Users/David/Desktop/chromedriver.exe')

url='https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o'
d.get(url)

time.sleep(5)

page = soup(d.page_source, 'html.parser')

precios = page.findAll('jsl',{'jstcache':'9322'})

print(precios)

d.quit()

EDIT 1: As Idlehands pointed out the jstcache number is probably dynamic and changes over time, so this aproach was not well thought. Instead I'm now using the following CSS selector combination QHarr suggested. Working code:

import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

whitelist = set('abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789')

chrome_options = Options()
chrome_options.add_argument("--headless")

d = webdriver.Chrome('C:/Users/David/Desktop/chromedriver.exe', options=chrome_options)

url='https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o'
d.get(url)

time.sleep(2)

precio = d.execute_script("return document.querySelector('.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')").text
precio = ''.join(filter(whitelist.__contains__, precio))

print(precio)

d.quit()

Upvotes: 0

QHarr
QHarr

Reputation: 84465

You can use the following CSS selector combination:

from selenium import webdriver

d = webdriver.Chrome()
d.get("https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o")
item = d.execute_script("return document.querySelector('.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')")
print(item.text)
d.quit()

Upvotes: 4

chitown88
chitown88

Reputation: 28565

from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options



d = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')

url='https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o'
d.get(url)

page = soup(d.page_source, 'html.parser')

precios = page.findAll('jsl',{'jstcache':'9322'})

print(precios)

d.quit();

worked for me:

print (precios[0].text)

gave me €32

Upvotes: 2

Related Questions