Reputation: 35
I am trying to parse a webpage with bs4 but the elements I am trying to access all have different class names. Example: class='list-item listing … id-12984' and class='list-item listing … id-10359'
def preownedaston(url):
preownedaston_resp = requests.get(url)
if preownedaston_resp.status_code == 200:
bs = BeautifulSoup(preownedaston_resp.text, 'lxml')
posts = bs.find_all('div', class_='') #don't know what to put here
for p in posts:
title_year = p.find('div', class_='inset').find('a').find('span', class_='model_year').text
print(title_year)
preownedaston('https://preowned.astonmartin.com/preowned-cars/search/?finance%5B%5D=price&price-currency%5B%5D=EUR&custom-model%5B404%5D%5B%5D=809&continent-country%5B%5D=France&postcode-area=United%20Kingdom&distance%5B%5D=0&transmission%5B%5D=Manual&budget-program%5B%5D=pay§ion%5B%5D=109&order=-usd_price&pageId=3760')
Is there a way to parse a partial class name like class_='list-item '
?
Upvotes: 1
Views: 984
Reputation: 46759
The information from this URL actually comes back in JSON format which means you can easily extract the fields you want. For example:
import requests
url = "https://preowned.astonmartin.com/ajax/stock-listing/get-items/pageId/3760/ratio/3_2/taxBandImageLink/aHR0cHM6Ly9kMnBwMTFwZ29wNWY2cC5jbG91ZGZyb250Lm5ldC9UYXhCYW5kLSV0YXhfYmFuZCUuanBn/taxBandImageHyperlink/JWRlYWxlcl9lbWFpbCU=/imgWidth/767/?finance%5B%5D=price&price-currency%5B%5D=EUR&custom-model%5B404%5D%5B%5D=809&continent-country%5B%5D=France&distance%5B%5D=0&transmission%5B%5D=Manual&budget-program%5B%5D=pay§ion%5B%5D=109&order=-usd_price&pageId=3760"
r = requests.get(url)
data = r.json()
details = ['make', 'mileage', 'model', 'model_year', 'mpg', 'exterior_colour', 'price_now']
for vehicle in data['vehicles']:
print()
for key in details:
print(f"{key:18} : {vehicle[key]}")
This displays the following:
make : Aston Martin
mileage : 42,000 km
model : V12 Vantage
model_year : 2011
mpg : 17.3
exterior_colour : Carbon Black
price_now : €114,900
make : Aston Martin
mileage : 42,000 km
model : V12 Vantage
model_year : 2011
mpg : 17.3
exterior_colour : Carbon Black
price_now : €99,900
Note: it might be necessary to add a user agent request header if the data is not returned. If you display data
you can see all of the available information for each vehicle.
This approach avoids the need to have javascript processing via Selenium and also avoids needing to parse any HTML using BeautifulSoup. The URL was found using the browser's network tools whilst the page was loading.
Upvotes: 2
Reputation: 1710
Css Selector for matching a partial value of a certain attribute is as follows :
div[class*='list-item'] # the * means match the class with this partial value
But if you look at the source code of the page you will see that the content you are trying to scrape is being generated by Javascript So you have three options here
I prefer this one in similar situation because you will be parsing Json
import requests , json
from bs4 import BeautifulSoup
URL = 'https://preowned.astonmartin.com/preowned-cars/search/?finance%5B%5D=price&price-currency%5B%5D=EUR&custom-model%5B404%5D%5B%5D=809&continent-country%5B%5D=France&postcode-area=United%20Kingdom&distance%5B%5D=0&transmission%5B%5D=Manual&budget-program%5B%5D=pay§ion%5B%5D=109&order=-usd_price&pageId=3760'
page = requests.get(URL, headers={"User-Agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"})
soup = BeautifulSoup(page.text, 'html.parser')
json_obj = soup.find('script',{'type':"application/ld+json"}).text
#{"@context":"http://schema.org","@graph":[{"@type":"Brand","name":""},{"@type":"OfferCatalog","itemListElement":[{"@type":"Offer","name":"Pre-Owned By Aston Martin","price":"€114,900.00","url":"https://preowned.astonmartin.com/preowned-cars/12984-aston-martin-v12-vantage-v8-volante/","itemOffered":{"@type":"Car","name":"Aston Martin V12 Vantage V8 Volante","brand":"Aston Martin","model":"V12 Vantage","itemCondition":"Used","category":"Used","productionDate":"2010","releaseDate":"2011","bodyType":"6.0 Litre V12","emissionsCO2":"388","fuelType":"Obsidian Black","mileageFromOdometer":"42000","modelDate":"2011","seatingCapacity":"2","speed":"190","vehicleEngine":"6l","vehicleInteriorColor":"Obsidian Black","color":"Black"}},{"@type":"Offer","name":"Pre-Owned By Aston Martin","price":"€99,900.00","url":"https://preowned.astonmartin.com/preowned-cars/10359-aston-martin-v12-vantage-carbon-edition-coupe/","itemOffered":{"@type":"Car","name":"Aston Martin V12 Vantage Carbon Edition Coupe","brand":"Aston Martin","model":"V12 Vantage","itemCondition":"Used","category":"Used","productionDate":"2011","releaseDate":"2011","bodyType":"6.0 Litre V12","emissionsCO2":"388","fuelType":"Obsidian Black","mileageFromOdometer":"42000","modelDate":"2011","seatingCapacity":"2","speed":"190","vehicleEngine":"6l","vehicleInteriorColor":"Obsidian Black","color":"Black"}}]},{"@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":"1","item":{"@id":"https://preowned.astonmartin.com/","name":"Homepage"}},{"@type":"ListItem","position":"2","item":{"@id":"https://preowned.astonmartin.com/preowned-cars/","name":"Pre-Owned Cars"}},{"@type":"ListItem","position":"3","item":{"@id":"//preowned.astonmartin.com/preowned-cars/search/","name":"Pre-Owned By Aston Martin"}}]}]}
items = json.loads(json_obj)['@graph'][1]['itemListElement']
for item in items :
print(item['itemOffered']['name'])
Output:
Aston Martin V12 Vantage V8 Volante
Aston Martin V12 Vantage Carbon Edition Coupe
Upvotes: 3