Reputation: 1255
I have built a very simple scraper looking at Airbnb listings. The goal is to go through a given site (i.e. this one).
first_page = BeautifulSoup(requests.get("https://www.airbnb.com/s/Copenhagen--Denmark/homes?allow_override%5B%5D=&s_tag=kHqeQTpz§ion_offset=1").text, 'html.parser')
listings = first_page.find_all('div', 'listing-card-wrapper')
for listing in listings:
print(listing.select("#listing-15616363 > div.infoContainer_v72lrv > a > div.ellipsized_1iurgbx > div > span:nth-child(1) > span:nth-child(1)"))
The code correctly loops through the 18 elements on the page. However, it prints 18 empty arrays indicating that the listing.select statement is not working. I got the CSS tag from the Chrome Dev tools copy selector function.
Upvotes: 5
Views: 8915
Reputation: 2233
This is because listing-15616363
is specific to every listing (notice the format listing-{listing_id}
) and so there is no class that has id = 'listing-15616363'
among your looped listings.
For instance, if you want to fetch url, you can do something like this :
listing.find('a', class_ = "linkContainer_55zci1")['href']
Alternatively, you can use python lxml which is order of magnitude faster than BeautifulSoup (if properly used), something like this :
import requests
from lxml import html
url = "https://www.airbnb.com/s/Copenhagen--Denmark/homes?allow_override%5B%5D=&s_tag=kHqeQTpz§ion_offset=1"
response = requests.get(url)
root = html.fromstring(response.content)
result_list = []
def remove_non_ascii(text) :
return ''.join([i if ord(i) < 128 else '' for i in text])
currency = root.xpath('//div[@itemprop="offers"]/meta[@itemprop="priceCurrency"]/@content')[0].strip()
for row in root.xpath('//div[contains(@class, "listing-card-wrapper")]') :
if row :
url = row.xpath('.//a[@class="linkContainer_55zci1"]/@href')[0].strip()
title = row.xpath('.//div[@class="ellipsized_1iurgbx"]/span/text()')[0].strip()
price = remove_non_ascii(row.xpath('.//div[@class="inline_g86r3e"]/span//text()')[0].strip())
result_list.append({'url' : "https://www.airbnb.com" + url,
'title' : title, 'price' : price, 'currency' : currency})
print result_list
This will result in :
[{'url': 'https://www.airbnb.com/rooms/5316912', 'currency': 'INR', 'price': u' 3,823', 'title': 'Small City apt. next to the Metro'}, {'url': 'https://www.airbnb.com/rooms/16989400', 'currency': 'INR', 'price': u' 2,347', 'title': 'Cozy room close to city center'}, {'url': 'https://www.airbnb.com/rooms/17628374', 'currency': 'INR', 'price': u' 6,774', 'title': 'Cosy, quiet apartment in downtown Copenhagen'}, {'url': 'https://www.airbnb.com/rooms/1206721', 'currency': 'INR', 'price': u' 4,426', 'title': 'Apt.close to Metro, Airport and CHP'}, {'url': 'https://www.airbnb.com/rooms/13813273', 'currency': 'INR', 'price': u' 3,622', 'title': 'Large room in Vesterbro'}, {'url': 'https://www.airbnb.com/rooms/14083881', 'currency': 'INR', 'price': u' 9,322', 'title': 'City Room'}, {'url': 'https://www.airbnb.com/rooms/6221130', 'currency': 'INR', 'price': u' 5,365', 'title': 'cosy flat 2 min from Central Statio'}, {'url': 'https://www.airbnb.com/rooms/15804159', 'currency': 'INR', 'price': u' 3,823', 'title': 'Cozy, central near waterfront. Quality breakfast!'}, {'url': 'https://www.airbnb.com/rooms/17266268', 'currency': 'INR', 'price': u' 3,756', 'title': 'Cosy room in Frederiksberg'}, {'url': 'https://www.airbnb.com/rooms/2647233', 'currency': 'INR', 'price': u' 3,353', 'title': 'Bedroom & Living Room Frederiksberg'}, {'url': 'https://www.airbnb.com/rooms/12083235', 'currency': 'INR', 'price': u' 5,969', 'title': 'Wonderful Copenhagen is right here'}, {'url': 'https://www.airbnb.com/rooms/7787976', 'currency': 'INR', 'price': u' 7,042', 'title': 'Homely renovated flat with garden'}, {'url': 'https://www.airbnb.com/rooms/17556785', 'currency': 'INR', 'price': u' 1,610', 'title': u'Small Cosy home above our Caf\xe9 ( Breakfast incl )'}, {'url': 'https://www.airbnb.com/rooms/894420', 'currency': 'INR', 'price': u' 10,261', 'title': 'Wonderful apt. right in the city!'}, {'url': 'https://www.airbnb.com/rooms/17028460', 'currency': 'INR', 'price': u' 7,847', 'title': 'Nyhavn 3-bed apartment for families'}, {'url': 'https://www.airbnb.com/rooms/17651114', 'currency': 'INR', 'price': u' 6,371', 'title': 'Spacious place by canals in heart of Copenhagen'}, {'url': 'https://www.airbnb.com/rooms/10564051', 'currency': 'INR', 'price': u' 3,420', 'title': u'\u623f\u95f4\u5728\u54e5\u672c\u54c8\u6839\u7684\u5fc3\u810f'}, {'url': 'https://www.airbnb.com/rooms/17709435', 'currency': 'INR', 'price': u' 2,951', 'title': u'Hyggelig lejlighed t\xe6t p\xe5 centrum.'}]
You can also refer to the documentation for scraping and lxml for further understanding.
Upvotes: 4
Reputation: 5157
When web-scraping try to use xpath or specific element attributes instead of css selectors, because they're often too specific for each element.
Instead of using css selectors, I've managed to achieve what you want by using the itemprop
attribute in the following code:
Code:
from bs4 import BeautifulSoup
import requests
html_source = requests.get("https://www.airbnb.com/s/Copenhagen--Denmark/homes?allow_override%5B%5D=&s_tag=kHqeQTpz§ion_offset=1").text
first_page = BeautifulSoup(html_source, 'html.parser')
listings = first_page.find_all('div', {'itemprop':'itemListElement'})
for l in listings:
a = l.find_next('meta')
b = a.find_next('meta')
c = b.find_next('meta')
print("Name: ", a['content'])
print("Position: ", b['content'])
print("URL: ", c['content'])
print("-"*15)
Output:
Name: Small City apt. next to the Metro - Apartment - København
Position: 1
URL: www.airbnb.com/rooms/5316912
---------------
Name: Cozy room close to city center - Apartment - Frederiksberg
Position: 2
URL: www.airbnb.com/rooms/16989400
---------------
Name: Cosy, quiet apartment in downtown Copenhagen - Apartment - København
Position: 3
URL: www.airbnb.com/rooms/17628374
---------------
Name: Apt.close to Metro, Airport and CHP - Apartment - Copenhagen
Position: 4
URL: www.airbnb.com/rooms/1206721
---------------
Name: Large room in Vesterbro - Apartment - København
Position: 5
URL: www.airbnb.com/rooms/13813273
---------------
Name: City Room - Apartment - København
Position: 6
URL: www.airbnb.com/rooms/14083881
---------------
Name: cosy flat 2 min from Central Statio - Apartment - København V
Position: 7
URL: www.airbnb.com/rooms/6221130
---------------
Name: Cozy, central near waterfront. Quality breakfast! - Apartment - København
Position: 8
URL: www.airbnb.com/rooms/15804159
---------------
Name: Cosy room in Frederiksberg - Apartment - Frederiksberg
Position: 9
URL: www.airbnb.com/rooms/17266268
---------------
Name: Bedroom & Living Room Frederiksberg - Apartment - Frederiksberg
Position: 10
URL: www.airbnb.com/rooms/2647233
---------------
Name: Wonderful Copenhagen is right here - Apartment - København
Position: 11
URL: www.airbnb.com/rooms/12083235
---------------
Name: Homely renovated flat with garden - Apartment - Frederiksberg
Position: 12
URL: www.airbnb.com/rooms/7787976
---------------
Name: Small Cosy home above our Café ( Breakfast incl ) - Bed & Breakfast - København
Position: 13
URL: www.airbnb.com/rooms/17556785
---------------
Name: Wonderful apt. right in the city! - Apartment - Copenhagen
Position: 14
URL: www.airbnb.com/rooms/894420
---------------
Name: Nyhavn 3-bed apartment for families - Apartment - Copenhagen
Position: 15
URL: www.airbnb.com/rooms/17028460
---------------
Name: Spacious place by canals in heart of Copenhagen - Apartment - København
Position: 16
URL: www.airbnb.com/rooms/17651114
---------------
Name: 房间在哥本哈根的心脏 - Apartment - København
Position: 17
URL: www.airbnb.com/rooms/10564051
---------------
Name: Hyggelig lejlighed tæt på centrum. - Apartment - København
Position: 18
URL: www.airbnb.com/rooms/17709435
---------------
Upvotes: 3