Andrea Barnabò
Andrea Barnabò

Reputation: 564

Web scraping google flight prices

I am trying to learn to use the python library BeautifulSoup, I would like to, for example, scrape a price of a flight on Google Flights. So I connected to Google Flights, for example at this link, and I want to get the cheapest flight price.

So I would get the value inside the div with this class "gws-flights-results__itinerary-price" (as in the figure).

figure example

Here is the simple code I wrote:

from bs4 import BeautifulSoup
import urllib.request

url = 'https://www.google.com/flights?hl=it#flt=/m/07_pf./m/05qtj.2019-04-27;c:EUR;e:1;sd:1;t:f;tt:o'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
div = soup.find('div', attrs={'class': 'gws-flights-results__itinerary-price'})

But the resulting div has class NoneType.

I also try with

find_all('div') 

but within all the div I found in this way, there was not the div I was interested in. Can someone help me?

Upvotes: 6

Views: 9597

Answers (3)

Hoossain Khan
Hoossain Khan

Reputation: 21

Its great that you are learning web scraping! The reason you are getting NoneType as a result is because the website that you are scraping loads content dynamically. When requests library fetches the url it only contains javascript. and the div with this class "gws-flights-results__itinerary-price" isn't rendered yet! So it won't be possible by the scraping approach you are using to scrape this website.

However you can use other methods such as fetching the page using tools such as selenium or splash to render the javascript and then parse the content.

Upvotes: 2

Punnerud
Punnerud

Reputation: 8051

BeautifulSoup is a great tool for extracting part of HTML or XML, but here it looks like you only need to get the url to another GET-request for a JSON object.

(I am not by a computer now, can update with an example tomorrow.)

Upvotes: 1

QHarr
QHarr

Reputation: 84465

Looks like javascript needs to run so use a method like selenium

from selenium import webdriver
url = 'https://www.google.com/flights?hl=it#flt=/m/07_pf./m/05qtj.2019-04-27;c:EUR;e:1;sd:1;t:f;tt:o'
driver = webdriver.Chrome()
driver.get(url)
print(driver.find_element_by_css_selector('.gws-flights-results__cheapest-price').text)
driver.quit()

Upvotes: 7

Related Questions