Reputation: 169
I am using this python code to extract cars details. However, I managed to extract them except the image source (URL) which am facing trouble with. You can at my code below:
import requests
from bs4 import BeautifulSoup
URL = "https://www.cars.com/shopping/results/?dealer_id=&keyword=&list_price_max=&list_price_min=&makes[]=&maximum_distance=all&mileage_max=&page=1&page_size=100&sort=best_match_desc&stock_type=cpo&year_max=&year_min=&zip="
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
cars = soup.find_all('div', class_='vehicle-card')
imag = soup.find_all('img')
name = []
mileage = []
dealer_name = []
rating = []
rating_count = []
price = []
for car in cars:
#name
name.append(car.find('h2').get_text())
#mileage
mileage.append(car.find('div', {'class':'mileage'}).get_text())
#dealer_name
dealer_name.append(car.find('div', {'class':'dealer-name'}).get_text())
#rate
rating.append(car.find('span', {'class':'sds-rating__count'}))
#rate_count
rating_count.append(car.find('span', {'class':'sds-rating__link'}).get_text())
#price
price.append(car.find('span', {'class':'primary-price'}).get_text())
for image in imag:
#img
img_url = image["src"]
print(img_url)
In terms of error, am getting the following output:
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
Traceback (most recent call last):
File "c:\scraping\scrapeData.py", line 33, in <module>
img_url = image["src"]
File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\bs4\element.py", line 1486, in __getitem__
return self.attrs[key]
KeyError: 'src'
Anyone have idea what am missing out here?
Upvotes: 1
Views: 701
Reputation: 16147
You need to debug your list. Obviously something in there doesn't have a src
tag.
The following will try to assign the src to variable, and if it fails due to src
not being valid, it will print the record.
for x in imag:
try:
d = x['src']
except KeyError:
print(x)
Output
<img alt="App Store download" class="app-store-button js-lazy-load" data-src="https://beta.cstatic-images.com/medium/in/v2/static/mobile-apps/app-store-badge-us-black-1.png"/>
<img alt="Google Play download" class="google-play-button js-lazy-load" data-src="https://beta.cstatic-images.com/medium/in/v2/static/mobile-apps/google-play-badge-us-1.png"/>
As you can see these are data-src
not src
.
This would parse them all. They are useful to you or not is another story:
for image in imag:
try:
img_url = image["src"]
except KeyError:
# Could just pass here if you don't want these
img_url = image['data-src']
print(img_url)
Upvotes: 1