Reputation: 107
Im trying to scrap the price of a flight from the Google Flights website using Selenium but said element does not show up anywhere, not even when scraping the whole page. Ive read that it might be due to it being in a different frame, but how would I know in which frame it is.
Here is the website: https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o
The price I'm looking for is: 32 €
And here is my code:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_argument("--headless")
d = webdriver.Chrome('/Users/davidgarciaballester/Desktop/chromedriver', options=chrome_options)
url='https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o'
d.get(url)
precios = soup(d.page_source, 'html.parser').findAll('jsl',{'jstcache':'9322'})
print(precios)
d.quit();
Am I missing something? Thanks in advance.
EDIT 1: jstcache changed value to 9322
Upvotes: 1
Views: 1260
Reputation: 107
Ok figured out what was going on. I wasn't giving the driver enough time to load the page. Fixed this by stalling for a few seconds after loading the page.
Working code:
from bs4 import BeautifulSoup as soup
from selenium import webdriver
import time
from selenium.webdriver.chrome.options import Options
d = webdriver.Chrome('C:/Users/David/Desktop/chromedriver.exe')
url='https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o'
d.get(url)
time.sleep(5)
page = soup(d.page_source, 'html.parser')
precios = page.findAll('jsl',{'jstcache':'9322'})
print(precios)
d.quit()
EDIT 1: As Idlehands pointed out the jstcache number is probably dynamic and changes over time, so this aproach was not well thought. Instead I'm now using the following CSS selector combination QHarr suggested. Working code:
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
whitelist = set('abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 0123456789')
chrome_options = Options()
chrome_options.add_argument("--headless")
d = webdriver.Chrome('C:/Users/David/Desktop/chromedriver.exe', options=chrome_options)
url='https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o'
d.get(url)
time.sleep(2)
precio = d.execute_script("return document.querySelector('.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')").text
precio = ''.join(filter(whitelist.__contains__, precio))
print(precio)
d.quit()
Upvotes: 0
Reputation: 84465
You can use the following CSS selector combination:
from selenium import webdriver
d = webdriver.Chrome()
d.get("https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o")
item = d.execute_script("return document.querySelector('.flt-subhead1.gws-flights-results__price.gws-flights-results__cheapest-price span + jsl')")
print(item.text)
d.quit()
Upvotes: 4
Reputation: 28565
from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
d = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
url='https://www.google.es/flights?lite=0#flt=/m/0h3tv./m/05qtj.2018-12-14;c:EUR;e:1;a:FR;sd:1;t:f;tt:o'
d.get(url)
page = soup(d.page_source, 'html.parser')
precios = page.findAll('jsl',{'jstcache':'9322'})
print(precios)
d.quit();
worked for me:
print (precios[0].text)
gave me €32
Upvotes: 2