Reputation: 11
I'm trying to scrape some listings property websites from the list. I wrote simple code to get data from one url, but when I'm trying with list ['url1','url2'] I have nothing as the result. I was trying also with csv list, but I still have nothing. I checked a lot of similar topics, but still empty result. Could you help me please to understand how to do it?
'''
import lxml
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.zillow.com/homedetails/105-Itasca-St-Boston-MA-02126/59137872_zpid/'
response = requests.get(url)
req_headers = {
'accept':
'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.8',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/61.0.3163.100 Safari/537.36'
}
with requests.Session() as s:
url
r = s.get(url, headers=req_headers)
soup = BeautifulSoup(r.content, 'lxml')
price = soup.find('span', {'class': 'ds-value'}).text
property_type = soup.find('span', {'class': 'ds-home-fact-value'}).text
address = soup.find('h1', {'class': 'ds-address-container'}).text
price, property_type, address '''
Upvotes: 0
Views: 675
Reputation: 305
To accomplish what you're asking to do with multiple urls, all you need to do is put them in a list and iterate over it:
import requests
from bs4 import BeautifulSoup
urls = [
'https://www.zillow.com/homedetails/105-Itasca-St-Boston-MA-02126/59137872_zpid/',
]
with requests.Session() as s:
for url in urls:
r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
# do something with soup
However, the main issue here is that pretty much everything interesting on your example webpage seems to be generated by JavaScript. For example, if you:
print(soup.body)
You'll see the body of html for this webpage has next to nothing (no price, no house details, etc.), save for a captcha mechanism to verify you're a human. You'll need to find a way to wait for the JavaScript to be rendered on the page to be able to scrape the details. Look into the python module selenium as a potential workaround for accomplishing this.
Upvotes: 2