Reputation: 423
I'm learning to scrape and am trying it out on Airbnb (here's the page). When I inspect one of the home images using Google Chrome, I see this:
I can't get my script to return the HTML that represents the stuff pictured (e.g. the link to the listing). Initial attempt:
import requests
url = "https://www.airbnb.co.uk/s/Rome/homes?checkin=2017-11-12&checkout=2017-11-19"
landing = requests.get(url)
print landing.content.find("rooms/")
That just returns a -1
(i.e. rooms/
isn't in the HTML).
Then some research threw up ideas about 'headers', so that Airbnb doesn't know I'm a script (the code is copy/pasted as I don't really get what these headers do). Someone else suggested using urllib instead. So the latest attempt is:
from urllib2 import Request,urlopen
user_agent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36'
headers = { 'User-Agent' : user_agent }
url = "https://www.airbnb.co.uk/s/Rome/homes?checkin=2017-11-12&checkout=2017-11-19"
req = Request(url,None,headers)
landing = urlopen(req)
print landing.read().find('rooms/')
This also returns a -1.
Any idea is much appreciated. I'm using Python 2.7 (Windows).
Upvotes: 0
Views: 1027
Reputation: 1510
This happens because the content is only loaded into your browser window by javascript after the initial request has finished. Basically, this is because of the way Airbnb is populating the DOM of their pages.
In order to be able to scrape such pages, you will need more advanced tricks than simple requests, I'm afraid.
Two tips, if you're a beginner:
Good luck!
Upvotes: 2
Reputation: 1815
It happens because request
doesn't run Javascript code. As a result you can't find rooms/
. You could use Selenium or Splash.
If you open page source and try to find rooms/
you will find no results either.
Upvotes: 3